Edge VMs are critical for data center traffic. During an upgrade, vMotion of the Edge VMs may cause more down time than a graceful failover. Edges deployed in a non-homogeneous network may have affinity with the underlying ESX host and require them to be pinned. Addressing these problems, the edge host pinning feature is designed to be able to pin edges to certain host(s). The Edge VM will be associated with a host group using an affinity rule and the Best Effort Restart(BER) compute policy will be configured on it...enabling host remediation in vSphere Lifecycle Manager (vLCM) workflow to gracefully shutdown edges when maintenance mode is enabled.
Sequence of steps in host remediation flow are as follows-
vLCM triggers remediate host workflow which triggers the pre-checks and then puts the ESX host in maintenance mode. This workflow will do the necessary pre-checks for peer edge failover readiness. However, if a user puts a host in maintenance mode manually, the pre-checks will not be triggered. The edges will shutdown with the host maintenance mode as the BER policy is set.
Following are some scenarios to note the difference in behavior
| Scenario | Edge Shutdown Behavior | Pre-check Before Shutdown Behavior | Remarks |
| Host remediation via vLCM orchestrated upgrade | N/A | ||
| Host bits upgrade using vSphere Upgrade Manager(VUM) | Refer to the workarounds section and invoke the pre-checks API before putting ESX in maintenance mode | ||
| NSX based Host upgrade | N/A | ||
| Host is put in maintenance mode manually at vCenter | Refer to the workarounds section and invoke the pre-checks API before putting ESX in maintenance mode | ||
| Host crashes | Refer to the workarounds section and invoke the pre-checks API before putting ESX in maintenance mode | ||
| NSX based edge upgrade | Same as earlier NSX versions | N/A |
Table 1: Behavior in various maintenance scenarios
NSX manager already exposes an API which needs to called via clients manually or can be automated. This API is documented at Execute checks before entering host into maintenance mode.
Get the vCenter UUID from the URL bar of a browser connected to the vCenter Server. In the following example, the vCenter UUID is 328cc7ad-####-####-984903321d2c.
Get the host moref value for the specific host which is under consideration to be put in maintenance mode. In the following example, having selected the host under consideration, 10.160.232.75, it's moref id is host-##.
Use an API call similar to the following against the NSX manager to get the pre-checks triggered for host-##:
POST https://<NSX manager IP/FQDN>/api/v1/upgrade/pre-upgrade-checks/host/planned-maintenance
Request body:
{
"vcenter-uuid": "328cc7ad-####-####-####-984903321d2c", <<<<<<<<<<<<<< vCenter uuid as mentioned in step 1
"entity-id": "host-##" <<<<<<<<<<<<<< host moref as mentioned in step 2
}
Response:
{
"check_statuses": [
{
"status": "OK",
"issues": [],
"info": {
"check": "com.vmware.nsxt.PeerEdgesOnSameHostCheck",
"name": {
"default_message": "Peer edges on same host check",
"id": "",
"localized": "Peer edges on same host check"
},
"description": {
"default_message": "This precheck ensures edges hosting peer LRs are not on same host under upgrade",
"id": "",
"localized": "This precheck ensures edges hosting peer LRs are not on same host under upgrade"
}
}
},
{
"status": "OK",
"issues": [],
"info": {
"check": "com.vmware.nsxt.PeerEdgeHealthCheckHostCheck",
"name": {
"default_message": "Peer edges health check task",
"id": "",
"localized": "Peer edges health check task"
},
"description": {
"default_message": "This precheck ensures peers of edges which are on the host under upgrade are healthy",
"id": "",
"localized": "This precheck ensures peers of edges which are on the host under upgrade are healthy"
}
}
},
{
"status": "OK",
"issues": [],
"info": {
"check": "com.vmware.nsxt.PeerLRStatusHostCheck",
"name": {
"default_message": "LR status check on peer edges",
"id": "",
"localized": "LR status check on peer edges"
},
"description": {
"default_message": "This precheck ensures LRs of edges on the host under upgrade have a healthy peer if fail over happens",
"id": "",
"localized": "This precheck ensures LRs of edges on the host under upgrade have a healthy peer if fail over happens"
}
}
},
{
"status": "OK",
"issues": [],
"info": {
"check": "com.vmware.nsxt.PeerLRBgpNeighbourHostCheck",
"name": {
"default_message": "BGP status check on peer edges",
"id": "",
"localized": "BGP status check on peer edges"
},
"description": {
"default_message": "This precheck ensures BGP neighbours of edges on the host under upgrade have a healthy peer if fail over happens",
"id": "",
"localized": "This precheck ensures BGP neighbours of edges on the host under upgrade have a healthy peer if fail over happens"
}
}
}
],
"status": "OK"
}
The above API will trigger the pre-checks for edge VMs hosted on host-## and will check for the health of it's active/standby peer.
If the status is "OK" for the above API, it means peer is healthy and host can be put into maintenance mode
Execute checks before entering host into maintenance mode
NSX Version based behavior:
The previously noted peer checks are enabled by default starting from upgrades to 9.0 and above. The pre-checks are done by default for the edges which are pinned using edge host affinity config, during host or edge upgrades.