Upgrade failed: Failed to execute ESXi post upgrade dataplane check. Error occurred while transferring the upgrade scripts to host, SFHC connectivity may be downController IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason
<Controller-IP> 1235 enabled disconnected true down NA CONTROLLER_REJECTED_HOST_CERT<Controller-IP> 1235 enabled not used false null NA NA<Controller-IP> 1235 enabled not used false null NA NA
esxupdate: 12251955: LiveImageInstaller: DEBUG: Output: nsx-proxy being upgraded /etc/init.d/nsx-proxy: line 1: can't open /tmp/host-cert.bak: no such file /etc/init.d/nsx-proxy: line 1: can't open /tmp/host-privkey.bak: no such file sh: 2: unknown operand backup proxy certificate not found, creating Copying CCP config from backup Copying host config file from backup Copying appliance info file from backup /etc/init.d/nsx-proxy: line 1: can't open /tmp/host-cert.bak: no such file /etc/init.d/nsx-proxy: line 1: can't open /tmp/host-privkey.bak: no such file sh: 2: unknown operand tnuuid = ########-####-####-####-############. Generating host certificate with TN uuid = ########-####-####-####-############. Generating certificate using make_cert.py Generating a RSA private key **************************************************************************************************************************************************************************************************+++++ ************************************************************************************************************************************************************************************************************************************************************************************************************************+++++ writing new private key to '/tmp/host-privkey.pem' ----- Entering make_cert.py Running ['openssl', 'req', '-days', '3650', '-new', '-nodes', '-x509', '-keyout', '/tmp/host-privkey.pem', '-out', '/tmp/host-cert.pem', '-config', '/tmp/tmp.######', '-extensions', 'req_ext'] Execution of openssl req returned 0 in 0.363 seconds. nsx-proxy startsnsx-proxy[12370596]: NSX 12370596 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="12370681" level="WARNING"] RpcConnection[10 Connecting to ssl://<ESXI-IP/FQDN>:1234 0] Couldn't connect to ssl://<ESXI-IP/FQDN>:1234 (error: 336151576-tlsv1 alert unknown ca (SSL routines, ssl3_read_bytes))nsx-proxy[12370596]: NSX 12370596 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="12370681" level="WARNING"] StreamConnection[5 Connecting to ssl://<ESXI-IP/FQDN>:1235 sid:5] Couldn't connect to 'ssl://<ESXI-IP/FQDN>:1235' (error: 336151574-sslv3 alert certificate unknown (SSL routines, ssl3_read_bytes)NSX-MGR NSX 120080 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="######" level="ERROR" errorCode="NET1111"] Certificate validation failed: 18-self-signed certificate#012Certificate: #012 Version: 3 (0x2) #012VMware NSX (upgrading on VMware ESXi 7.x only)
VMware NSX (upgrading from version >= 4.2.1.0 and < 4.2.3)
VMware NSX (upgrading to version > 4.2.1.0 and < 4.2.3)
The following points detail the cause of communication failure between the host transport node and the NSX controller after a VIB upgrade:
This issue is resolved in VMware NSX 4.2.3, available at Broadcom downloads.
Greenfield deployments with VMware NSX 4.2.3 and later versions do not have this issue.
Upgrades from VMware NSX with versions greater or equal to 4.2.1.0 to VMware NSX 4.2.3 or later also do not have this issue.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
Workaround and Preventive steps:
Proactive Prevention Steps
There is a way to prevent the 'NSX transport node disconnected' problem even before the upgrade activity.
If an ESXi host with version 7.x currently has the NSX VIBs installed, then the host-cert.pem and host-privkey.pem are expected to have the below permissions:
File path: /etc/vmware/nsx
Expected Permissions for the files in question :
-rw-rw-rwT 1 root root 1610 Jan 22 10:01 host-cert.pem
-rw-rw-rwT 1 root root 1704 Jan 22 10:01 host-privkey.pem
But if the permissions for host-cert.pem and host-privkey.pem are different than above, then the files have wrong permissions and the host is expected to hit the 'NSX transport node disconnected' problem during upgrade to 4.2.1.x or 4.2.2.x.
We can proactively validate the permission of the files in each host and manage them correctly to avoid the issue. Here is how to correct the permissions:
chmod 1666 /etc/vmware/nsx/host-cert.pem /etc/vmware/nsx/host-privkey.pem
Recovery Steps
Controller IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason<Controller-IP> 1235 enabled disconnected true down NA CONTROLLER_REJECTED_HOST_CERT<Controller-IP> 1235 enabled not used false null NA NA<Controller-IP> 1235 enabled not used false null NA NA nsxcli -c push host-certificate <NSX Manager IP or FQDN> username admin thumbprint <thumbprint obtained in step #2>nsxcli -c get controllers
Note: Confirm the controller connection state is green on the UI for this host transport node.Note: If the ESXi host display Failure Reason MAINTAINANCE_MODE as below, take the following steps:
nsxcli -c get controllers
Controller IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason<Controller-IP> 1235 enabled disconnected true down NA MAINTAINANCE_MODE<Controller-IP> 1235 enabled not used false null NA MAINTAINANCE_MODE<Controller-IP> 1235 enabled not used false null NA MAINTAINANCE_MODE
If this KB did not help resolve your issue, you can review the following KBs for further troubleshooting steps:
Loss of Controller Connectivity after Host Upgrade
Troubleshooting NSX Host Upgrade Failures