ESXi Host's NSX Configuration status shows Host Disconnected under System->Fabric->Hosts
The NSX Manager log /var/log/vmware/appl-proxy.log display messages similar to:
<Date>T<Time>Z nsx-mgr-2 NSX 789948 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-rpc" tid="789952" level="ERROR" errorCode="RPC503"] RpcTransport[1]::RemoteService[vmware.nsx.certificate.CertificateService] Failed to resolve service: 6-No such device or address
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
Versions where this is a known issue:
Versions where this is fixed:
Scenario-1: When certificates are not revoked
We only need to flush the revoked certificate from NestDB cache and restart the nsx-proxy but since for some TNs we additionally needed to push TNs cert to manager, so we can do this for all the TNs.Steps for these are mentioned below:
Scenario-2: When the certificates are actually revoked
TN certificate replacement with manual intervention:
Note: Collect NSX manager thumbprint using command "get certificate api thumbprint"
APH certificate replacement
Prerequisites
Trust-store is the source of truth for APH certificates/keys.
Specification
APH certificate replacement with manual intervention
"api/v1/trust-management/certificates/<certificate-id>?action=apply_certificate&service_type=APH_TN&node_id=<mp-id-corresponding-to-aph>" API.Impact/Risks:
Hosts cannot connect to MP because of "certificate verification failed".
Other helpful KBs related to certificate issue:
Alarm For Transport Node Certificate Has Expired.
NSX Configuration in Host Transport Node shows failed after certificate replacement