ESXi Host's NSX Configuration status shows Host Disconnected in NSX UI
search cancel

ESXi Host's NSX Configuration status shows Host Disconnected in NSX UI

book

Article ID: 329221

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

ESXi Host's NSX Configuration status shows Host Disconnected under System->Fabric->Hosts

The NSX Manager log /var/log/vmware/appl-proxy.log display messages similar to: 

<Date>T<Time>Z nsx-mgr-2 NSX 789948 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-rpc" tid="789952" level="ERROR" errorCode="RPC503"] RpcTransport[1]::RemoteService[vmware.nsx.certificate.CertificateService] Failed to resolve service: 6-No such device or address

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX 4.1.x

Cause

CrlMgr is not able to create Trust-Store service stub (vmware.nsx.certificate.CertificateService) hosted on managers, because of which CrlMgr is marking the certificate as revoked and updating the cache in the NestDB.

Resolution

Versions where this is a known issue:

  • For Self-Signed certificates:
    • The issue is a known issue in NSX 4.1.0
  • For CA-Signed Certificates:
    • the issue is known issue for all the versions from 4.1.0

Versions where this is fixed:

  • For Self-Signed certificates:
    • The Issue is Fixed in 4.1.0.2
  • For CA-Signed Certificates:
    • This is a known issue



Workaround

 

Scenario-1: When certificates are not revoked

We only need to flush the revoked certificate from NestDB cache and restart the nsx-proxy but since for some TNs we additionally needed to push TNs cert to manager, so we can do this for all the TNs.Steps for these are mentioned below:

  • Enable SSH on the TN, if not already enabled.  Connect to the TN with ssh and perform the following steps. 
  • Flush nestDB cache using CMD: /opt/vmware/nsx-nestdb/bin/nestdb-cli --json --cmd flush vmware.nsx.nestdb.CrlCertificatesCacheMsg
  • Restart the nsx-proxy using command: /etc/init.d/nsx-proxy restart
  • Push the host certificate to MP cmd: nsxcli -c “push host-certificate <hostname-or-ip-address[:port]> username <username> thumbprint <thumbprint>”

Scenario-2: When the certificates are actually revoked

TN certificate replacement with manual intervention:

  • Enable SSH on the TN, if not already enabled.  Connect to the TN with ssh and perform the following steps. 
  • Delete the TN private key and TN cert using command: rm /etc/vmware/nsx/host-privkey.pem /etc/vmware/nsx/host-cert.pem
  • Restart the nsx-proxy using command: /etc/init.d/nsx-proxy restart
  • Push the updated host certificate to MP cmd: nsxcli -c “push host-certificate <hostname-or-ip-address[:port]> username <username> thumbprint <thumbprint>”

 

APH certificate replacement

Prerequisites

Trust-store is the source of truth for APH certificates/keys.

Specification

  • The APH certificate and key update workflow from MP is user initiated.
  • Trust-store updates Messaging with the new APH cert chain using a REST API call.
  • Messaging manager sends a config update to the attached TNs which includes the new APH certificate and completes the REST API call invoked by Trust-store.
  • On receiving a successful result from Messaging manager, Trust-store replaces the APH key and cert files.
  • APH consumes the new cert chain and key and refreshes its security context. 

APH certificate replacement with manual intervention

  • Replace APH cert and key using the POST  "api/v1/trust-management/certificates/<certificate-id>?action=apply_certificate&service_type=APH_TN&node_id=<mp-id-corresponding-to-aph>" API.
  • For TNs which continue to be disconnected for an extended period of time (five minutes) after the above step:
  • Enable SSH on the TN, if not already enabled, and SSH to the TN.
  • Stop the nsx-proxy service on the TN. /etc/init.d/nsx-proxy stop
  • On the TN, if the APH certificate in/etc/vmware/nsx/appliance-info.xml does not match the certificate corresponding to "certificate-id" in step1, fetch the latest APH certificates from MP using the nsxcli command: “sync-aph-certificates <manager-hostname-or-ip-address[:port]> username <username> thumbprint <thumbprint> [password <password>]"
  • Start the nsx-proxy service on the TN.  /etc/init.d/nsx-proxy start

Additional Information

Impact/Risks:

Hosts cannot connect to MP because of "certificate verification failed