NSX Edge nodes will be in "Down" state with no tep connectivity
Communication between the Management Plane Agent (MPA) and the controller cluster is completely disrupted.
Symptoms include:
Running get controllers on the affected Edge Node CLI displays a disconnected status with the explicit failure reason: CONTROLLER_REJECTED_HOST_CERT.
Running get managers shows managers in a Standby (NSX-RPC) state.
The /var/log/syslog file on the Edge nodes contains the following SSL handshake entries:
StreamConnection[<REDACTED_NUMERIC> Connecting to ssl://<IP>:1234 sid:<REDACTED_NUMERIC>] Couldn't connect to 'ssl://<IP>:1234' (error: 335544539-short read)
VMware NSX-T Data Center
VMware NSX
The local host certificate (host-cert.pem) residing on the Edge Node filesystem has expired.
we can check the certificate expiry using the below command.
openssl x509 -startdate -enddate -noout -in /etc/vmware/nsx/host-cert.pem
Note: If Transport node certificates have already expired and the 24 hour grace period has elapsed, Transport Nodes will be disconnected. At this point CARR Script can no longer be used to replace the Transport Node certs
cd /etc/vmware/nsx/Sample output:
ls
appliance-info.xml host-cert.pem host-privkey.pem netopa.xml openssl-proxy.cnf
controller-info.xml host-cfg.xml mpa-txn nsx-proxy.xml
cp host-cert.pem host-cert.pem.bak
rm host-cert.pem
host-cert.pem file./etc/init.d/nsx-proxy restart
openssl x509 -startdate -enddate -noout -in /etc/vmware/nsx/host-cert.pem
push host-certificate <manager-IP-FQDN> username <username> thumbprint <cert-api-thumbprint-of-manager> password <password>
sync-aph-certificates <manager-IP-FQDN> username <username> thumbprint <cert-api-thumbprint-of-manager> password <password>