Host Disconnected after certificate expiration and replacement, Transport Nodes connection status is timing out at 48%. ("Waiting for Connection to Managers")
search cancel

Host Disconnected after certificate expiration and replacement, Transport Nodes connection status is timing out at 48%. ("Waiting for Connection to Managers")

book

Article ID: 417130

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The NSX Manager certificates expired and were subsequently replaced.
  • The NSX Manager UI reports that the Transport Nodes are in a disconnected state.
  • The Transport Node status, retrieved via the API endpoint GET https://<nsx-mgr>/api/v1/transport-nodes/<transport-node-id>/state, indicates that MPA connectivity is down and the heartbeat has been missed. 

    {
        "node_uuid": "########-####-####-####-############",
        "node_display_name": "<Transport_Node_Name>",
        "status": "UNKNOWN",
        "mgmt_connection_status": "DOWN",
        "node_status": {
            "last_heartbeat_timestamp": ########,
            "last_sync_time": ########,
            "mpa_connectivity_status": "DOWN",
            "mpa_connectivity_status_details": "Client has not responded to {2} consecutive heartbeats. Port {1234} between Host to NSX Manager must be open, Please check underlay physical firewalls and host hypervisor firewalls for troubleshooting.",
            "lcp_connectivity_status": "UNKNOWN",
            "lcp_connectivity_status_details": [],
            "host_node_deployment_status": "HOST_DISCONNECTED",
            "inventory_sync_paused": false,
            "software_version": "########",
    

     

  • The network test, using "nc -zvv", confirmed successful connectivity on TCP ports 1234 and 1235 from the Transport Node to the NSX Manager. 
  • The controller session status is reported as "down" with the failure reason: CONTROLLER_REJECTED_HOST_CERT.

    root@esxi:~] nsxcli -c get controllers
    ## MM DD YYYY utc HH:MM:SS.###
    Controller IP    Port    SSL           Status    Is Physical Master    Session State    Controller FQDN              Failure Reason
    ##.##.##.243      1235    enabled    not used    false                       null                          NA                 NA
    ##.##.##.241      1235    enabled    disconnected    true                down                        NA                 CONTROLLER_REJECTED_HOST_CERT
    ##.##.##.242      1235    enabled    not used    false                         null                         NA                 NA

     

  • The connection state is observed as TIME_WAIT.

    [root@esxi:~] esxcli network ip connection list | grep 1234
    tcp 0 0 ##.##.##.134:35746 ##.##.##.242:1234 TIME_WAIT 0
    tcp 0 0 ##.##.##.134:45735 ##.##.##.243:1234 TIME_WAIT 0
    tcp 0 0 ##.##.##.134:19181 ##.##.##.241:1234 TIME_WAIT 0
    

     

  • In the /var/run/log/nsx-syslog.log file, you will observe the errors "certificate verify failed" and "certificate unknown" alert.

    YYYY-MM-DDTHH:MM:SS.###Z nsx-proxy[2357689]: NSX 2357689 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2357712" level="WARNING"] RpcTransport[0] Unable to connect to ssl://##.##.##.243:1234: ######-certificate verify failed (SSL routines, ssl3_get_server_certificate)
    YYYY-MM-DDTHH:MM:SS.###Z nsx-proxy[2357689]: NSX 2357689 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2357712" level="INFO"] StreamSocket[697 Open f:49 i:0 ? -> ssl://##.##.##.243:1235] on_connect ######-sslv3 alert certificate unknown (SSL routines, ssl3_read_bytes)

Environment

VMware NSX

Cause

The NSX Manager is currently rejecting the certificate, suggesting it may be invalid.

Resolution

  1. Rename the appliance configuration file so that a new file can be created: Rename appliance-info.xml file by running the command: "mv /etc/vmware/nsx/appliance-info.xml /etc/vmware/nsx/appliance-info.xml.bak"..
  2. Restart NSX services: Restart the following NSX services on the Transport Node:
    /etc/init.d/nsx-proxy restart
    /etc/init.d/nsx-opsagent restart
    /etc/init.d/nsx-cfg agent restart

  3. From the NSX UI select host disconnected and select to resolve the issue or push the newly generated Transport Node certificate to the NSX Manager using thumbprint. (Refer to KB 345825 for specific instructions).
  4. Once the host is successfully connected, a new appliance-info.xml file will have been pushed back to the host. After validating the connection, you may remove the backup file by running: "rm /etc/vmware/nsx/appliance-info.xml.bak"

Additional Information

Alarm For Transport Node Certificate Has Expired.

Connection between host and NSX Controller is UNKNOWN due to connection between host and NSX Manager is DOWN.