ESXi Host's NSX Configuration status shows Host Disconnected in NSX UI
search cancel

ESXi Host's NSX Configuration status shows Host Disconnected in NSX UI

book

Article ID: 329221

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

ESXi Host's NSX Configuration status shows Host Disconnected under System->Fabric->Hosts

The NSX Manager log /var/log/vmware/appl-proxy.log display messages similar to: 

<Date>T<Time>Z nsx-mgr-2 NSX 789948 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-rpc" tid="789952" level="ERROR" errorCode="RPC503"] RpcTransport[1]::RemoteService[vmware.nsx.certificate.CertificateService] Failed to resolve service: 6-No such device or address

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX 4.1.x

Cause

Control Manager is not able to create Trust-Store service stub (vmware.nsx.certificate.CertificateService) hosted on managers, because of which Control Manager is marking the certificate as revoked and updating the cache in the NestDB.

Resolution

Versions where this is a known issue:

  • For Self-Signed certificates:
    • The issue is a known issue in NSX 4.1.0
  • For CA-Signed Certificates:
    • the issue is known issue for all the versions from 4.1.0

Versions where this is fixed:

  • For Self-Signed certificates:
    • The Issue is Fixed in 4.1.0.2
  • For CA-Signed Certificates:
    • This is a known issue


Workaround

Scenario-1: When certificates are not revoked

We only need to flush the revoked certificate from NestDB cache and restart the nsx-proxy but since for some TNs we additionally needed to push TNs cert to manager, so we can do this for all the TNs.Steps for these are mentioned below:

  • Enable SSH on the TN, if not already enabled.  Connect to the TN with ssh and perform the following steps. 
  • Flush nestDB cache using CMD: /opt/vmware/nsx-nestdb/bin/nestdb-cli --json --cmd flush vmware.nsx.nestdb.CrlCertificatesCacheMsg
  • Restart the nsx-proxy using command: /etc/init.d/nsx-proxy restart
  • Push the host certificate to MP cmd: nsxcli -c “push host-certificate <hostname-or-ip-address[:port]> username <username> thumbprint <Manager thumbprint>”

Scenario-2: When the certificates are actually revoked

TN certificate replacement with manual intervention:

  • Enable SSH on the TN, if not already enabled.  Connect to the TN with ssh and perform the following steps. 
  • Delete the TN private key and TN cert using command: rm /etc/vmware/nsx/host-privkey.pem /etc/vmware/nsx/host-cert.pem
  • Restart the nsx-proxy using command: /etc/init.d/nsx-proxy restart
  • Push the updated host certificate to MP cmd: nsxcli -c “push host-certificate <hostname-or-ip-address[:port]> username <username> thumbprint <Manager thumbprint>”

Note: Collect NSX manager thumbprint using command "get certificate api thumbprint"

 

APH certificate replacement

Prerequisites

Trust-store is the source of truth for APH certificates/keys.

Specification

  • The APH certificate and key update workflow from MP is user initiated.
  • Trust-store updates Messaging with the new APH cert chain using a REST API call.
  • Messaging manager sends a config update to the attached TNs which includes the new APH certificate and completes the REST API call invoked by Trust-store.
  • On receiving a successful result from Messaging manager, Trust-store replaces the APH key and cert files.
  • APH consumes the new cert chain and key and refreshes its security context. 

APH certificate replacement with manual intervention

  • Replace APH cert and key using the POST  "api/v1/trust-management/certificates/<certificate-id>?action=apply_certificate&service_type=APH_TN&node_id=<mp-id-corresponding-to-aph>" API.
  • For TNs which continue to be disconnected for an extended period of time (five minutes) after the above step:
  • Enable SSH on the TN, if not already enabled, and SSH to the TN.
  • Stop the nsx-proxy service on the TN. /etc/init.d/nsx-proxy stop
  • On the TN, if the APH certificate in/etc/vmware/nsx/appliance-info.xml does not match the certificate corresponding to "certificate-id" in step1, fetch the latest APH certificates from MP using the nsxcli command: “sync-aph-certificates <manager-hostname-or-ip-address[:port]> username <username> thumbprint <thumbprint> [password <password>]"
  • Start the nsx-proxy service on the TN.  /etc/init.d/nsx-proxy start

Additional Information

Impact/Risks:

Hosts cannot connect to MP because of "certificate verification failed".

Other helpful KBs related to certificate issue:

NSX Edge transport Node shows MPA disconnected on NSX GUI after replacing internal certificates in NSX 4.#

Alarm For Transport Node Certificate Has Expired.

NSX Configuration in Host Transport Node shows failed after certificate replacement