"Waiting for MPA".
"Upgrade failed: Failed to execute ESXi post upgrade dataplane check. Error occurred while transferring the upgrade scripts to host, SFHC connectivity may be down".
"Heartbeating between NSX management node and host ##### is down"
and "Unexpected error while upgrading upgrade unit. Command IsHostInMaintenanceMode failed on host(######).

CONTROLLER_REJECTED_HOST_CERT
when command 'nsxcli -c get controllers'
is run from host CLI:Controller IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason
<Controller-IP> 1235 enabled disconnected true down NA CONTROLLER_REJECTED_HOST_CERT
<Controller-IP> 1235 enabled not used false null NA NA
<Controller-IP> 1235 enabled not used false null NA NA
Error message similar to the below may be seen in the ESXi host logs:
/var/run/log/esxupdate.log
esxupdate: 12251955: LiveImageInstaller: DEBUG: Output: nsx-proxy being upgraded /etc/init.d/nsx-proxy: line 1: can't open /tmp/host-cert.bak: no such file /etc/init.d/nsx-proxy: line 1: can't open /tmp/host-privkey.bak: no such file sh: 2: unknown operand backup proxy certificate not found, creating Copying CCP config from backup Copying host config file from backup Copying appliance info file from backup /etc/init.d/nsx-proxy: line 1: can't open /tmp/host-cert.bak: no such file /etc/init.d/nsx-proxy: line 1: can't open /tmp/host-privkey.bak: no such file sh: 2: unknown operand tnuuid = ########-####-####-####-############. Generating host certificate with TN uuid = ########-####-####-####-############. Generating certificate using make_cert.py Generating a RSA private key **************************************************************************************************************************************************************************************************+++++ ************************************************************************************************************************************************************************************************************************************************************************************************************************+++++ writing new private key to '/tmp/host-privkey.pem' ----- Entering make_cert.py Running ['openssl', 'req', '-days', '3650', '-new', '-nodes', '-x509', '-keyout', '/tmp/host-privkey.pem', '-out', '/tmp/host-cert.pem', '-config', '/tmp/tmp.######', '-extensions', 'req_ext'] Execution of openssl req returned 0 in 0.363 seconds. nsx-proxy starts
/var/run/log/nsx-syslog
INFO task-executor-9-1-workitem-HOST-### InspectionTask 1371044 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] [HUT] For host <ESXI-IP/FQDN>, error is Issue: Heartbeating between NSX management node and host <ESXI-IP/FQDN> is down.
nsx-proxy[12370596]: NSX 12370596 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="12370681" level="WARNING"] RpcConnection[10 Connecting to ssl://<ESXI-IP/FQDN>:1234 0] Couldn't connect to ssl://<ESXI-IP/FQDN>:1234 (error: 336151576-tlsv1 alert unknown ca (SSL routines, ssl3_read_bytes))
nsx-proxy[12370596]: NSX 12370596 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="12370681" level="WARNING"] StreamConnection[5 Connecting to ssl://<ESXI-IP/FQDN>:1235 sid:5] Couldn't connect to 'ssl://<ESXI-IP/FQDN>:1235' (error: 336151574-sslv3 alert certificate unknown (SSL routines, ssl3_read_bytes)
nsx-proxy[7014696]: NSX 7014696 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" tid="7014696" level="INFO"] Write ccp session message to nestdb ccp_id { 7caaxxxx-1cxx-46xx-a6xx-77c06exxxxxx } ip { ipv4: 214xxx601 } server_port: 1235 fqdn: "" state: DISCONNECTED master: false
nsx-proxy[7014696]: NSX 7014696 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" tid="7014696" level="INFO"] Write ccp session message to nestdb ccp_id { a0fdxxxx-c0xx-43xx-a7xx-8d946bxxxxxx } ip { ipv4: 214xxx602 } server_port: 1235 fqdn: "" state: DISCONNECTED master: false
nsx-proxy[7014696]: NSX 7014696 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" tid="7014696" level="INFO"] Write ccp session message to nestdb ccp_id { 0058xxxx-93xx-4axx-90b3-98d041xxxxxx } ip { ipv4: 214xxx600 } server_port: 1235 fqdn: "" state: DISCONNECTED master: true failure_reason: CONTROLLER_REJECTED_HOST_CERT
nsx-proxy[7014696]: NSX 7014696 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" tid="7014696" level="INFO"] CcpConnection: Connecting to new CCP a0fdxxxx-c0xx-43xx-a7a8-8d946bxxxxxx.
nsx-proxy[7014696]: NSX 7014696 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" tid="7014696" level="INFO"] CcpConnection: Disconnecting from ssl://128.x.x.16:1235
/var/log/syslog
NSX-MGR NSX 1391201 - [nsx@6876 audit="true" comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] UserName="<Username>", Src="<IP-address>", ModuleName="Upgrade", Operation="GetUpgradestatusSunmary", Operation status="success", New value=[{"selection_status": "ALL" }]
NSX-MGR NSX 120080 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="######" level="ERROR" errorCode="NET1111"] Certificate validation failed: 18-self-signed certificate#012Certificate: #012 Version: 3 (0x2) #012
"Connection between host [host-uuid] and NSX Controller is DOWN. Response : Client is responding to heartbeats
"
"NSX service on the host are not at target version 4.#.#.#.###"
VMware NSX 4.2.1.x
VMware NSX 4.2.2.x
VMware ESXi 7.0.x
This behavior is the result of a known issue that prevents the upgraded ESXi host(s) from reconnecting with NSX Manager post-VIB upgrade. Post-VIB upgrade, the NSX controller is not aware of the new host transport node certificate which was generated by nsx-proxy as part of its startup INIT script resulting in communication breakage between the host transport node and NSX controller.
This issue is resolved in VMware NSX 4.2.3, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
Workaround:
Option A
When the host shows stalled at 45% "Waiting for MPA" in the upgrade page:
If Option A, did not resolve the issue for you, proceed with Option B:
Option B
nsxcli -c get controllers.
Controller IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason
<Controller-IP> 1235 enabled disconnected true down NA CONTROLLER_REJECTED_HOST_CERT
<Controller-IP> 1235 enabled not used false null NA NA
<Controller-IP> 1235 enabled not used false null NA NA
get certificate api thumbprint.
ESXi> nsxcli -c push host-certificate <NSX Manager IP or FQDN> username admin thumbprint <thumbprint obtained in step #2>
ESXi> nsxcli -c get controllers
Note: Confirm the controller connection state is green on the UI for this host transport node.
Note: If the ESXi host display Failure Reason MAINTAINANCE_MODE as below, take the following steps:
nsxcli -c get controllers
Controller IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason
<Controller-IP> 1235 enabled disconnected true down NA MAINTAINANCE_MODE
<Controller-IP> 1235 enabled not used false null NA MAINTAINANCE_MODE
<Controller-IP> 1235 enabled not used false null NA MAINTAINANCE_MODE
Note: If this issue continues, restart the following NSX services on the ESXi host:
ESXi> /etc/init.d/nsx-opsagent restart
ESXi> /etc/init.d/nsx-proxy restart
Reference:
Loss of Controller Connectivity after Host Upgrade
Proactive prevention:
There is a way to prevent the 'NSX transport node disconnected' problem even before the upgrade activity.
If an ESXi with version 7.0.x currently has the VIBs from NSX version 4.2.1.0, then the host-cert.pem
and host-privkey.pem
are expected to have the below permissions:
File path: /etc/vmware/nsx
Expected Permissions for the files in question :
-rw-rw-rwT 1 root root 1610 Jan 22 10:01 host-cert.pem
-rw-rw-rwT 1 root root 1704 Jan 22 10:01 host-privkey.pem
But if the permissions for host-cert.pem
and host-privkey.pem
are different than above, then the files have wrong permissions and the host is expected to hit the 'NSX transport node disconnected' problem during upgrade to 4.2.1.x, 4.2.2.x.
We can proactively validate the permission of the files in each host and manage them correctly to avoid the issue. Here is how to correct the permissions:
chmod 1666 /etc/vmware/nsx/host-cert.pem /etc/vmware/nsx/host-privkey.pem
If this KB did not help resolve your issue, you can review the following KB for further troubleshooting steps: Troubleshooting NSX Host Upgrade Failures