The NSX Manager certificates reached their expiration date and were subsequently replaced.
All ESXi Transport Nodes displays an NSX Configuration status of 'Host Disconnected' within the NSX Manager UI.
The ESXi Transport Nodes are displaying multiple TCP connections in a TIME_WAIT state on ports 1234 and 1235, as shown below:
Appliance Proxy Hub (APH) acts as a communication channel between NSX Manager and the transport node. APH runs as a service on NSX Manager.Uses port 1234 for communication between the management plane and transport node.Uses port 1235 for communication between the CCP and transport node.
[root@ESXIHOST] esxcli network ip connection list | grep -i <ipaddress_of_Host>
tcp 0 0 ##.##.##.51:50268 ##.##.##.17:1234 TIME_WAIT 0
tcp 0 0 ##.##.##.51:56995 ##.##.##.15:1234 TIME_WAIT 0
tcp 0 0 ##.##.##.51:54114 ##.##.##.17:1235 TIME_WAIT 0
tcp 0 0 ##.##.##.51:33423 ##.##.##.15:1235 TIME_WAIT 0
Alternatively, you may run the below commands to check the connectivity between NSX Manager and Transport Node on port 1234 and 1235.
[root@ESXIHOST] localcli network ip connection list | grep 1234
[root@ESXIHOST] localcli network ip connection list | grep 1235
Controller connectivity shows down on TNs for the respective master nodes as below:
[root@esx:~] :~] nsxcli -c get controllers "HOST_REJECTED_CONTROLLER_CERT"
Controller IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason
##.##.##.16 1235 enabled not used false null NA NA
##.##.##.17 1235 enabled disconnected true down NA HOST_REJECTED_CONTROLLER_CERT
##.##.##.15 1235 enabled not used false null NA NA
# Controller and manager connectivity down due the certificate rejection between the HOST and Controller.
[root@esx:~] nsxcli -c verify controllers certificate
Controller IP Port CRL Status Certificate Status
##.##.##.15 1235 CERTIFICATE_REVOKED HOST_REJECTED_CONTROLLER_CERT and CONTROLLER_REJECTED_HOST_CERT
##.##.##.17 1235 CERTIFICATE_REVOKED HOST_REJECTED_CONTROLLER_CERT and CONTROLLER_REJECTED_HOST_CERT
##.##.##.16 1235 NA CONNECTION_TIMED_OUT
/var/run/log/nsx-syslog.log.YYYY-MM-DDTHH:MM:SS.###Z Wa(180) nsx-proxy[6315252]: NSX 6315252 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="6315274" level="WARNING"] StreamConnection[2886 Connecting to ssl://##.##.##.17:1234 sid:2886] Couldn't connect to 'ssl://##.##.##.17:1234' (error: 336134278-certificate verify failed)
YYYY-MM-DDTHH:MM:SS.###Z Wa(180) nsx-proxy[6315252]: NSX 6315252 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="6315274" level="WARNING"] StreamConnection[2886 Error to ssl://##.##.##.17:1234 sid:-1] Error 336134278-certificate verify failed
YYYY-MM-DDTHH:MM:SS.###Z Wa(180) nsx-proxy[6315252]: NSX 6315252 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="6315274" level="WARNING"] RpcConnection[2886 Connecting to ssl://##.##.##.17:1234 0] Couldn't connect to ssl://##.##.##.17:1234 (error: 336134278-certificate verify failed)
YYYY-MM-DDTHH:MM:SS.###Z Er(179) nsx-proxy[6453519]: NSX 6453519 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="6453519" level="ERROR" errorCode="NET1109"] X509Certificate: PEM - failed to read X509: x906d06c-PEM routines:PEM_read_bio:no start line
YYYY-MM-DDTHH:MM:SS.###Z In(182) nsx-proxy[6453519]: NSX 6453519 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="mpa-proxy-lib" tid="6453519" level="INFO"] AphInfo: invalid certificate #CN=##,OU=##,O=##,L=##,ST=##,C=##-----BEGIN CERTIFICATE-----
YYYY-MM-DDTHH:MM:SS.###Z In(182)[+] nsx-proxy[6453519]: MIIGIzCCBAugAwIBAgIUDioC/bYg2v90vtPbM1jGKzwM6cgwDQYJKoZIhvcNAQE
cat /etc/vmware/nsx/appliance-info.xml.cat /etc/vmware/nsx/appliance-info.xml.
VMware NSX
Certs used to replace the expired certs were having extra characters that were not being processed by openssl causing the cert not getting properly loaded.
This was causing the communication issues with the NSX managers and TNs leading to TNs in disconnected state in UI.
The content of the cert should be validated before being applied on NSX managers (they should not have extra characters).
We would need to replace the certs again. This time the replaced cert should be WITHOUT the extra characters that were causing problem earlier.
FIX : There is a logic added in VCF9.0 onwards that discards the extra characters while loading the cert.