Edge upgrade fails with below error message:
Edge <Target upgrade version of edge>/Edge/nubVMware-NSX-edge-<Target upgrade version of edge>.nub power on task failed on edge TransportNode <TN UUID> clientType EDGE. target edge fabric node id <TN UUID>. return status null
Below error message can be seen in Vcenter for the specifc edge vm.
VMware NSX
An existing vSphere Distributed Resource Scheduler (DRS) VM-to-VM Anti-Affinity rule prevented the Edge VM from powering on.
The VM anti-affinity rule was in place to prevent the nodes of the NSX Edge cluster from running on the same ESXi host.
When the other ESXi hosts in the cluster encountered resource issues, no suitable host was available.
As a result, the power-on task was blocked by a DRS affinity rule to have both edges in the same host.
This specific failure is a known, transient issue that occurred during the Edge upgrade workflow, as outlined in the Edge upgrade flow-admin guide. While the upgrade process typically involves the NSX upgrade coordinator powering off and then automatically powering on the Edge VM after the OS upgrade, certain circumstances can lead to the VM's OS failing to power on completely. This causes the power-on task to time out, interrupting the edge upgrade workflow.
To resolve the power-on failure and allow the NSX Edge upgrade to proceed:
Temporarily disable the Anti-Affinity rule, or alternatively migrate the VM to different ESXi host which has enough resource to power on VM.
Verify Edge VM power-on
Resume upgrade.
Note: After the upgrade is complete and the cluster resources are stable, evaluate whether the affinity rule is still necessary and if it can be re-enabled without causing future power-on issues, or if it needs to be modified for better flexibility.
If you are facing this issue and the above work arounds do not allow the upgrade to complete then please open a ticket with Broadcom support about this issue, please provide the following:
Handling Log Bundles for offline review with Broadcom support