NSX 4.1.0 and 4.1.1, ESX Host TN logic in Disconnected state
search cancel

NSX 4.1.0 and 4.1.1, ESX Host TN logic in Disconnected state

book

Article ID: 329221

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

 

  • ESX Host's NSX Configuration status will show Host Disconnected under System->Fabric->Hosts
  • The NSX Manager log /var/log/vmware/appl-proxy.log display messages similar to: 

<Date>T<Time>Z nsx-mgr-2 NSX 789948 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-rpc" tid="789952" level="ERROR" errorCode="RPC503"] RpcTransport[1]::RemoteService[vmware.nsx.certificate.CertificateService] Failed to resolve service: 6-No such device or address

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX 4.1.0

Cause

CrlMgr is not able to create Trust-Store service stub (vmware.nsx.certificate.CertificateService) hosted on managers, because of which CrlMgr is marking the certificate as revoked and updating the cache in the NestDB.

Resolution

Version where this is a known issue:
  • For Self-Signed certificates:
    • The issue is a known issue in NSX 4.1.0
  • For CA-Signed Certificates:
    • the issue is known issue for all the versions from 4.1.0

*Version where this is fixed*:

  • For Self-Signed certificates:
    • The Issue is Fixed in 4.1.0.2
  • For CA-Signed Certificates:
    • This is a known issue



Workaround:

Scenario-1: When certificates are not revoked

For the Workaround steps we only need to flush the revoked certificate from NestDB cache and restart the nsx-proxy.
For some TNs we additionally needed to push TNs cert to manager, so we can do this for all the TNs.

Steps for these are mentioned below.

  1. Enable SSH on the TN, if not already enabled.  Connect to the TN with ssh and perform the following steps. 
  2. Flush nestDB cache using CMD: /opt/vmware/nsx-nestdb/bin/nestdb-cli --json --cmd flush vmware.nsx.nestdb.CrlCertificatesCacheMsg
  3. Restart the nsx-proxy using command: /etc/init.d/nsx-proxy restart
  4. Push the host certificate to MP cmd: nsxcli -c “push host-certificate <hostname-or-ip-address[:port]> username <username> thumbprint <thumbprint>”

Scenario-2: When the certificates are actually revoked

TN certificate replacement with manual intervention

  1. Enable SSH on the TN, if not already enabled.  Connect to the TN with ssh and perform the following steps. 
  2. Delete the TN private key and TN cert using command: rm /etc/vmware/nsx/host-privkey.pem /etc/vmware/nsx/host-cert.pem
  3. Restart the nsx-proxy using command: /etc/init.d/nsx-proxy restart
  4. Push the updated host certificate to MP cmd: nsxcli -c “push host-certificate <hostname-or-ip-address[:port]> username <username> thumbprint <thumbprint>”

APH certificate replacement

Prerequisites

Truststore is the source of truth for APH certificates/Keys.

Specification

  1. The APH certificate and key update workflow from MP is user initiated.
  2. TrustStore updates Messaging with the new APH cert chain using a REST API call.
  3. Messaging manager sends a config update to the attached TNs which includes the new APH certificate and completes the REST API call invoked by TrustStore.
  4. On getting a successful result from Messaging manager, TrustStore replaces the APH key and cert files.
  5. APH consumes the new cert chain and key and refreshes its security context. 

APH certificate replacement with manual intervention

  1. Replace APH cert and key using the POST  "api/v1/trust-management/certificates/<certificate-id>?action=apply_certificate&service_type=APH_TN&node_id=<mp-id-corresponding-to-aph>" API.
  2. For TNs which continue to be disconnected for an extended period of time (five minutes) after the above step:
a.Enable SSH on the TN, if not already enabled, and SSH to the TN.
b.Stop the nsx-proxy service on the TN. /etc/init.d/nsx-proxy stop
c.On the TN, if the APH certificate in/etc/vmware/nsx/appliance-info.xml does not match the certificate corresponding to "certificate-id" in step1:
i.Fetch the latest APH certificates from MP using the nsxcli command: “sync-aph-certificates <manager-hostname-or-ip-address[:port]> username <username> thumbprint <thumbprint> [password <password>]"
d.Start the nsx-proxy service on the TN.  /etc/init.d/nsx-proxy start



Additional Information