Troubleshooting Edge VM Configuration "Mismatch" Alarm(s) in NSX UI Alarm in NSX-T 3.2.0 and Onwards
search cancel

Troubleshooting Edge VM Configuration "Mismatch" Alarm(s) in NSX UI Alarm in NSX-T 3.2.0 and Onwards

book

Article ID: 345864

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Beginning with NSX-T 3.2.0 version and newer, there is behavior change for Edge Management Plane (MP) intent. If the user directly updates some edge node settings in Edge CLI or via vCenter, then these changes are not directly updated on Edge Node's MP intent. In such case's, the user will be alerted with Edge node mismatch alarm. This alarm tells user that some edge node configuration is has been changed directly on the Edge CLI or in vCenter. Edge node "Configuration State" in NSX-T/NSX UI will also be updated with this mismatch. There are 4 types of edge node mismatch alarms as mentioned below.

Alarm 1: Edge Node Settings Mismatch

Alarm will be raised when the Edge node's parameters in Edge Node CLI and Edge Node MP intent are found to be different. If any one of below Edge Node Settings field is changed directly through Edge CLI, then this alarm will be raised.

Edge node settings

  1. "Enable SSH"
  2. "DNS Servers"
  3. "Search domains"
  4. "NTP servers"
  5. "Host name"
  6. "Syslog Servers"
  7. "UPT_MODE" <<<< Added beginning with NSX 4.1.0 as a realization failure

Alarm 2: Edge VM vSphere Settings Mismatch

An alarm will be raised when the Edge node's vSphere parameters within vCenter and Edge node MP intent are found to be different. If a user changes any one of below Edge configurations inside vSphere through vCenter, then this alarm will be raised.

Edge VM vSphere settings

  1. "Display Name"
  2. "Compute Id"
  3. "Storage Id"
  4. "Management Network Id"
  5. "Data Networks Ids"
  6. "Form Factor"
  7. "CPU Reservation in shares"
  8. "CPU Reservation in MHz"
  9. "Memory Reservation percentage"

Alarm 3: Edge Node Settings and vSphere Settings are changed

An alarm will be raised when the Edge node vSphere parameters in vCenter and Edge node CLI parameters are found to be different than Edge Node MP intent. If a user changes edge fields from both "Edge node settings" and "Edge VM vSphere settings" directly on edge CLI and VCenter respectively, then this alarm will be raised.

Alarm 4: Edge vSphere Location Mismatch

An alarm will be raised when user uses vMotion to move Edge VMs. The datastore ("Storage Id") and/or compute cluster id ("Compute Id") parameters of Edge Node in vSphere will be changed when Edge VM is moved. Thus, when these Edge node vSphere settings parameters on vCenter and Edge Node MP intent are found to be different, then alarm will be raised. Thus, if any (or all) of below fields is changed then this alarm is raised.

  1. "Compute Id"
  2. "Storage Id"

If other than "Compute Id" and "Storage Id" some more "Edge VM vSphere settings" or "Edge node settings" fields are changed, then "Edge VM vSphere Settings Mismatch" Alarm or "Edge Node Settings and vSphere Settings are changed" Alarm will be raised based on fields that are changed.

How Mismatch Alarm Looks Like in NSX-T UI

  • Edge Mismatch Alarm is displayed in System, Nodes, Edge Transport Nodes page as below:
  • Edge Mismatch Alarm is also displayed in NSX-T UI Home page.

Environment

VMware NSX

Resolution

This issue is resolved in VMware NSX 4.2.1, available at Broadcom downloads.

NSX 4.2.1 introduced an Auto Refresh feature, the NSX Manager will automatically update the Edge, intent, configuration to match the real world, realized, configuration. No mismatch alarm will be generated.

Note: upgrades from NSX 4.2.0 to 4.2.1.x have a known issue where the Auto Refresh feature does not get enabled and mismatch alarms can still be generated.
In this case the feature must be enabled via API.

Check if the feature is enabled:
GET https://{manager-ip}/policy/api/v1/system-config?key=auto_refresh_edge_transport_nodes

Enable the feature:
PATCH https://{manager-ip}/policy/api/v1/system-config
{
    "keyValuePairs": [
        {
            "key": "auto_refresh_edge_transport_nodes",
            "value": "true"
        }
    ]
}

For releases prior to NSX 4.2.1, action must be taken to resolve the alarm

Option #1 (preferred)

  • Select Mismatch in Edge node Configuration State as shown below:
  • This will open a pop-up window as shown below:
  • Select "vSphere/Edge Appliance" as a Source and click Resolve. This will resolve mismatch alarm. This operation executes the Edge node refresh API internally. Edge node refresh API updates Edge node MP intent with latest data from Edge node and resolves mismatch alarm.
  • In this way of resolving the alarm, actual Edge node configuration (on CLI or on vCenter) gets copied to Edge Node MP intent.

Option #2 (less preferred) 

  • Select Mismatch in Edge node Configuration State as shown below:
  •  This will open a pop-up window as shown below:
  • Select "NSX" as a Source and click Resolve. This will resolve mismatch alarm. This operation executes the Edge node update API internally. This API realizes Edge node configuration on MP to Edge node. 
    Warning: If there is mismatch in Compute Id/Storage Id field, then on selecting "NSX" as source, the Edge node will be redeployed and this will cause traffic disruption. A warning message about traffic disruption will be displayed:



  • In this case the Edge VM in vSphere will be updated so that its configuration matches the intent configuration known by NSX and as defined at deployment time.

Certain Corner cases

In certain corner cases alarm might not get resolved with above mentioned Approach 1 or Approach 2. In such cases follow below mentioned steps to resolve alarm manually.

Case 1: Alarm is not resolved from Edge-MP vertical side

  • Check output of Edge transport node state API: GET https://<manager-ip>/api/v1/transport-nodes/<edge-uuid>/state
  • If "node_deployment_state" in Edge transport node state API is mismatch, similar to as shown below. Then mismatch is still present:  
{
  "node_deployment_state": {
    "state": "EDGE_VM_VSPHERE_SETTINGS_MISMATCH_RESOLVE",
    "details": [
      {
        "sub_system_id": "EDGE_TRANSPORT_NODE_MISMATCH_ALARMS",
        "state": "EDGE_VM_VSPHERE_SETTINGS_MISMATCH_RESOLVE",
        "failure_message": " configuration on vSphere : {\"CPU Reservation in shares\":\"NORMAL_PRIORITY\",\"Storage Id\":\"datastore-14\"}, intent vSphere configuration :{\"CPU Reservation in shares\":\"LOW_PRIORITY\",\"Storage Id\":\"datastore-50\"}",
        "failure_code": 16087
      }
    ],
    "failure_message": "",
    "failure_code": 0
  }
}
  • To resolve this mismatch fire refresh api (Refresh api does not need any request body): POST https://<manager-ip>/api/v1/transport-nodes/<edge-uuid>?action=refresh_node_configuration&resource_type=EdgeNode
  • Now, check again the output of Edge transport node state api:GET https://<manager-ip>/api/v1/transport-nodes/<edge-uuid>/state If in this Edge transport node state api, "node_deployment_state" for Edge transport node is NODE_READY then then we can say mismatch is resolved from Edge-MP vertical side:
{
  "node_deployment_state": {
    "state": "NODE_READY",
    "details": []
  }
}
  • If "node_deployment_state" is still a mismatch, then there is really a mismatch between the Edge node MP intent and realized Edge node configuration on CLI or vCenter.

Case 2: Alarm is resolved from Edge-MP vertical side, but not resolved from Alarm Framework side

  • Check output of Edge transport node state api: GET https://<manager-ip>/api/v1/transport-nodes/<edge-uuid>/state
  • In this API output, "node_deployment_state" for Edge transport node is NODE_READY. This means Edge-MP vertical resolved mismatch:
{
  "node_deployment_state": {
    "state": "NODE_READY",
    "details": []
  }
}
  • Now check if mismatch alarm is still OPEN using alarm API: GET https://<manager-ip>/api/v1/alarms?status=OPEN
  • If this alarm API shows our mismatch alarm as OPEN. Then we will need to resolve this mismatch alarm manually. Because Edge-MP vertical side resolved mismatch, but Alarm Framework failed to resolve alarm.
  • To resolve the alarm manually, Select Mismatch Alarm from "Open Alarms", "Actions", "Acknowledge" the alarm . This can be done from System, Fabric, Nodes page or from Home, Alarms page , as shown in below images:
 
 

Case 3: In case if NSX Edge VM has been manually edited/migrated within vCenter and then the old cluster/datastore/port-groups (that NSX Manager is aware of) were deleted

  • In this scenario you would need to re-deploy NSX edge.
    1. Deploy new NSX edge VM with the correct settings over NSX UI: Deploy NSX Edge Nodes
    2. replace the faulty NSX Edge VM with new one in the Edge cluster: Replace an NSX Edge Transport Node Using the NSX Manager UI
    3. If you want to keep the original fqdn and IP address, then once you deleted the faulty NSX Edge VM, you can create a new NSX Edge VM with the original settings as per step1 and replace it with the temporary Edge VM as per step2.

Additional Information

Note: "Edge Node MP intent" term refers to Edge Transport Node configuration data which is present in NSX-T Manager Database. We get same data as payload when we do a GET call for this edge transport node e.g. GET https://<manager-ip>/api/v1/transport-nodes/<edge-tn-id>

From NSX 4.1.1, UPT mode may be edited even when Edge maintenance mode is enabled.

As part of UPT mode realization, maintenance mode is toggled on the Edge VM.

If user edits UPT mode on edge with maintenance mode enabled, then user must disable maintenance mode, before UPT realization completes. Mismatch alarm is raised after partial UPT realization when Edge has maintenance mode enabled. The alarm will resolve using NSX Value, when User disables maintenance mode on edge. 

Related KB articles