Missing context data in TN alarm's description/recommended action when TN and manager are in different versions
search cancel

Missing context data in TN alarm's description/recommended action when TN and manager are in different versions

book

Article ID: 345756

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Alarm context data keys might change between different NSX releases. The alarm framework has logic to only send current version supported alarm context key-value pairs to syslog/alarms and collector/snmp trp.
  • If a Transport Node (e.g. edge node or host node) and manager are in different versions, the TN will only send its version's supported context key-values pairs to manager, which might be different than what the manager supports.
  • This will result in missing context data in the TN alarm's description/recommended action if TN nodes and manager nodes are in different versions. An example of this would be the following alarm:
Edge Health:
Edge CPU Usage High
The CPU usage on Edge node {{entity_id}} has reached 90% which is at or above the high threshold value of 60%. 

Environment

VMware NSX-T

Cause

  • As an example the edge_health.edge_cpu_usage_high alarm, will use the following keys for the different versions:

In 3.2.x the context data’s keys that the feature sends to the alarm framework are:

 entity_id, system_resource_usage 

and the alarm framework support keys are: 

entity_id, system_resource_usage

In 4.2, the context data’s keys that the feature sends to the alarm framework are: 

entity_id, transport_node_name, transport_node_address, system_resource_usage


and the alarm framework support keys are: 

transport_node_name, transport_node_address, system_resource_usage
  • With a 4.2 edge and a 3.2.x NSX manager, the edge will only send context key-value pairs: transport_node_name, transport_node_address, system_resource_usage to a 3.2.x manager.
  • But 3.2.x manager also needs "entity_id" to fulfill its description. This will result with the following alarm description in a 3.2.X Manager.
The CPU usage on Edge node {{entity_id}} has reached 90%" in Description in 3.2.x manager

Resolution

  • The alarm missing context issue is resolved by completing the NSX upgrade, so the NSX Manager has been upgraded to the same version as the transport nodes.
  • Please note all the alarms will be deleted after the manager upgrade.
  • If the issue triggering the alarm is still there, then a new alarm will be raised.
  • Since the edge and manager versions match now, the description/recommended action will be filled with the expected context data.