When troubleshooting NSX Edge High Availability (HA) (failover or failure to failover), a specific set of data must be gathered at the time of the event. This article details what documentation is required and how to gather it prior to opening a support request with Broadcom.
VMware NSX
When it comes to NSX Edges, many troubleshooting sessions stem from the question “Why did this Edge failover?” or, more frequently, “Why did it NOT failover when it was expected to?” The answer to both of these questions is usually found in the NSX Edge’s reaction or lack of reaction to environmental changes such as configuration updates, workload increases, vMotion of the Edge, physical component failover testing, or other external factors beyond the Edge’s control.
NOTE: The NSX Edge virtual machines can benefit from vSphere HA. This article is not about troubleshooting that feature.
Edge High Availability (HA) Requirements:
Edge Failover Detection Mechanisms:
Documentation on how Edge HA works in various implementations can be found at the following links:
Log locations and keywords:
CLI commands to check/verify Edge status:
Known Issues
Log Line Analysis:
Because BGP is frequently needed in order to determine the up and down status of an Edge’s service routers, it is frequently involved in failovers. Some significant log lines cross both BGP troubleshooting and HA troubleshooting. For example:
If you are contacting Broadcom support about this issue, please provide the following:
Handling Log Bundles for offline review with Broadcom support