Troubleshooting Edge VM Configuration Mismatch Alarm in UI
search cancel

Troubleshooting Edge VM Configuration Mismatch Alarm in UI

book

Article ID: 345864

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

There is a behavior change for Edge Management Plane (MP) intents. If the user updates edge node settings directly in the Edge CLI or via vCenter, those changes are not reflected in the Edge Node's MP intent. In such cases, the user will be alerted with an Edge node mismatch alarm. This alarm indicates that an edge node configuration has been changed directly in the Edge CLI or in vCenter. The "Configuration State" edge node in the NSX-T/NSX UI will also be updated to reflect this mismatch.

Environment

VMware NSX 4.2

Cause

Alarm 1: Edge Node Settings Mismatch

An alarm will be raised when the Edge Node CLI and Edge Node MP intent parameters are found to differ. If any one of the below Edge Node Settings fields is changed directly through Edge CLI, then this alarm will be raised.

Edge node settings

  1. "Enable SSH"
  2. "DNS Servers"
  3. "Search domains"
  4. "NTP servers"
  5. "Host name"
  6. "Syslog Servers"
  7. "UPT_MODE" <<<< Added beginning with NSX 4.1.0 as a realization failure

Alarm 2: Edge VM vSphere Settings Mismatch

An alarm will be raised when the Edge node's vSphere parameters within vCenter and Edge node MP intent are found to be different. If a user changes any one of below Edge configurations inside vSphere through vCenter, then this alarm will be raised.

Edge VM vSphere settings

  1. "Display Name"
  2. "Compute Id"
  3. "Storage Id"
  4. "Management Network Id"
  5. "Data Networks Ids"
  6. "Form Factor"
  7. "CPU Reservation in shares"
  8. "CPU Reservation in MHz"
  9. "Memory Reservation percentage"

Alarm 3: Edge Node Settings and vSphere Settings have been changed

An alarm will be raised when the Edge node vSphere parameters in vCenter and Edge node CLI parameters are found to be different than Edge Node MP intent. If a user changes edge fields from both "Edge node settings" and "Edge VM vSphere settings" directly on edge CLI and VCenter, respectively, then this alarm will be raised.

Alarm 4: Edge vSphere Location Mismatch

An alarm will be raised when a user uses vMotion to move Edge VMs. The datastore ("Storage Id") and/or compute cluster id ("Compute Id") parameters of the Edge Node in vSphere will be changed when the Edge VM is moved. Thus, when the Edge node vSphere settings parameters in vCenter and the Edge Node MP intent differ, an alarm will be raised. Thus, if any (or all) of the following fields are changed, then this alarm is raised.

  1. "Compute Id"
  2. "Storage Id"

If other than "Compute Id" and "Storage Id" some more "Edge VM vSphere settings" or "Edge node settings" fields are changed, then "Edge VM vSphere Settings Mismatch" Alarm or "Edge Node Settings and vSphere Settings are changed" Alarm will be raised based on the fields that are changed.

How Mismatch Alarm Looks Like in NSX-T UI

  • Edge Mismatch Alarm is displayed in System, Nodes, Edge Transport Nodes page as below:
  • Edge Mismatch Alarm is also displayed in NSX-T UI Home page.

Resolution

This issue is resolved in VMware NSX 4.2.1, which is available from Broadcom downloads.

NSX 4.2.1 introduced an Auto Refresh feature: the NSX Manager automatically updates the Edge, intent, and configuration to match the realized configuration. No mismatch alarm will be generated.

Note: Upgrades from NSX 4.2.0 to 4.2.1.x have a known issue in which the Auto Refresh feature is not enabled, and mismatch alarms can still be generated.
In this case the feature must be enabled via API.

Check if the feature is enabled:
GET https://{manager-ip}/policy/api/v1/system-config?key=auto_refresh_edge_transport_nodes

Enable the feature:
PATCH https://{manager-ip}/policy/api/v1/system-config
{
    "keyValuePairs": [
        {
            "key": "auto_refresh_edge_transport_nodes",
            "value": "true"
        }
    ]
}

For releases prior to NSX 4.2.1, action must be taken to resolve the alarm

Option #1 (preferred)

  • Select Mismatch in Edge node Configuration State as shown below:
  • This will open a pop-up window as shown below:
  • Select "vSphere/Edge Appliance" as a Source and click Resolve. This will resolve the mismatch alarm. This operation internally invokes the Edge node refresh API. Edge node refresh API updates Edge node MP intent with the latest data from the Edge node and resolves the mismatch alarm.
  • In this way of resolving the alarm, the actual Edge node configuration (on CLI or on vCenter) gets copied to the Edge Node MP intent.

Option #2 (less preferred) 

  • Select Mismatch in Edge node Configuration State as shown below:
  •  This will open a pop-up window as shown below:
  • Select "NSX" as a Source and click Resolve. This will resolve mismatch alarm. This operation executes the Edge node update API internally. This API realizes Edge node configuration on MP to Edge node. 
    Warning: If there is mismatch in Compute Id/Storage Id field, then on selecting "NSX" as source, the Edge node will be redeployed and this will cause traffic disruption. A warning message about traffic disruption will be displayed:



  • In this case the Edge VM in vSphere will be updated so that its configuration matches the intent configuration known by NSX and as defined at deployment time.

Certain Corner cases

In certain edge cases, the alarm might not be resolved by the above-mentioned Approach 1 or Approach 2. In such cases, follow the steps mentioned below to resolve the alarm manually.

Case 1: Alarm is not resolved from the Edge-MP vertical side

  • Check output of Edge transport node state API: GET https://<manager-ip>/api/v1/transport-nodes/<edge-uuid>/state
  • If the "node_deployment_state" in the Edge transport node state API is mismatched, as shown below. Then the mismatch is still present:  
{
  "node_deployment_state": {
    "state": "EDGE_VM_VSPHERE_SETTINGS_MISMATCH_RESOLVE",
    "details": [
      {
        "sub_system_id": "EDGE_TRANSPORT_NODE_MISMATCH_ALARMS",
        "state": "EDGE_VM_VSPHERE_SETTINGS_MISMATCH_RESOLVE",
        "failure_message": " configuration on vSphere : {\"CPU Reservation in shares\":\"NORMAL_PRIORITY\",\"Storage Id\":\"datastore-14\"}, intent vSphere configuration :{\"CPU Reservation in shares\":\"LOW_PRIORITY\",\"Storage Id\":\"datastore-50\"}",
        "failure_code": 16087
      }
    ],
    "failure_message": "",
    "failure_code": 0
  }
}
  • To resolve this mismatch fire refresh api (Refresh api does not need any request body): POST https://<manager-ip>/api/v1/transport-nodes/<edge-uuid>?action=refresh_node_configuration&resource_type=EdgeNode
  • Now, check the output of Edge transport node state api:GET https://<manager-ip>/api/v1/transport-nodes/<edge-uuid>/state If in this Edge transport node state api, "node_deployment_state" for Edge transport node is NODE_READY, then we can say the mismatch is resolved from Edge-MP vertical side:
{
  "node_deployment_state": {
    "state": "NODE_READY",
    "details": []
  }
}
  • If "node_deployment_state" still shows a mismatch, there is a discrepancy between the Edge node MP intent and the realized Edge node configuration in the CLI or vCenter.

Case 2: Alarm is resolved from the Edge-MP vertical side, but not resolved from the Alarm Framework side

  • Check output of Edge transport node state api: GET https://<manager-ip>/api/v1/transport-nodes/<edge-uuid>/state
  • In this API output, "node_deployment_state" for Edge transport node is NODE_READY. This means Edge-MP vertical resolved mismatch:
{
  "node_deployment_state": {
    "state": "NODE_READY",
    "details": []
  }
}
  • Now check if mismatch alarm is still OPEN using alarm API: GET https://<manager-ip>/api/v1/alarms?status=OPEN
  • If this alarm API shows our mismatch alarm as OPEN. Then we will need to resolve this mismatch alarm manually. Because Edge-MP vertical side resolved mismatch, but Alarm Framework failed to resolve alarm.
  • To resolve the alarm manually, Select Mismatch Alarm from "Open Alarms", "Actions", "Acknowledge" the alarm . This can be done from System, Fabric, Nodes page or from Home, Alarms page , as shown in below images:
 
 

 

Case 3: Unable to resolve alarm due to "Duplicate syslog server" validation error

Additional Information

Note: "Edge Node MP intent" term refers to Edge Transport Node configuration data which is present in NSX-T Manager Database. We get same data as payload when we do a GET call for this edge transport node e.g. GET https://<manager-ip>/api/v1/transport-nodes/<edge-tn-id>

From NSX 4.1.1, UPT mode may be edited even when Edge maintenance mode is enabled.

As part of UPT mode realization, maintenance mode is toggled on the Edge VM.

If user edits UPT mode on edge with maintenance mode enabled, then user must disable maintenance mode, before UPT realization completes. Mismatch alarm is raised after partial UPT realization when Edge has maintenance mode enabled. The alarm will resolve using NSX Value, when User disables maintenance mode on edge. 

Related KB articles