Connecting the Transport Nodes following the NSX Manager's expired certificate replacement is timing out at 48%. ("Waiting for Connection to Managers")
search cancel

Connecting the Transport Nodes following the NSX Manager's expired certificate replacement is timing out at 48%. ("Waiting for Connection to Managers")

book

Article ID: 417130

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The NSX Manager certificates expired and were subsequently replaced.
  • The NSX Manager UI reports that the Transport Nodes are in a disconnected state.
  • The Transport Node status, retrieved via the API endpoint GET https://<nsx-mgr>/api/v1/transport-nodes/<transport-node-id>/state, indicates that MPA connectivity is down and the heartbeat has been missed. 

    {
        "node_uuid": "########-####-####-####-############",
        "node_display_name": "<Transport_Node_Name>",
        "status": "UNKNOWN",
        "mgmt_connection_status": "DOWN",
        "node_status": {
            "last_heartbeat_timestamp": ########,
            "last_sync_time": ########,
            "mpa_connectivity_status": "DOWN",
            "mpa_connectivity_status_details": "Client has not responded to {2} consecutive heartbeats. Port {1234} between Host to NSX Manager must be open, Please check underlay physical firewalls and host hypervisor firewalls for troubleshooting.",
            "lcp_connectivity_status": "UNKNOWN",
            "lcp_connectivity_status_details": [],
            "host_node_deployment_status": "HOST_DISCONNECTED",
            "inventory_sync_paused": false,
            "software_version": "########",
    

     

  • The network test, using "nc -zvv", confirmed successful connectivity on TCP ports 1234 and 1235 from the Transport Node to the NSX Manager. 
  • The controller session status is reported as "down" with the failure reason: CONTROLLER_REJECTED_HOST_CERT.

    root@esxi:~] nsxcli -c get controllers
    ## MM DD YYYY utc HH:MM:SS.###
    Controller IP    Port    SSL           Status    Is Physical Master    Session State    Controller FQDN              Failure Reason
    ##.##.##.243      1235    enabled    not used    false                       null                          NA                 NA
    ##.##.##.241      1235    enabled    disconnected    true                down                        NA                 CONTROLLER_REJECTED_HOST_CERT
    ##.##.##.242      1235    enabled    not used    false                         null                         NA                 NA

     

  • The connection state is observed as TIME_WAIT.

    [root@esxi:~] esxcli network ip connection list | grep 1234
    tcp 0 0 ##.##.##.134:35746 ##.##.##.242:1234 TIME_WAIT 0
    tcp 0 0 ##.##.##.134:45735 ##.##.##.243:1234 TIME_WAIT 0
    tcp 0 0 ##.##.##.134:19181 ##.##.##.241:1234 TIME_WAIT 0
    

     

  • In the /var/run/log/nsx-syslog.log file, you will observe the errors "certificate verify failed" and "certificate unknown" alert.

    YYYY-MM-DDTHH:MM:SS.###Z nsx-proxy[2357689]: NSX 2357689 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2357712" level="WARNING"] RpcTransport[0] Unable to connect to ssl://##.##.##.243:1234: ######-certificate verify failed (SSL routines, ssl3_get_server_certificate)
    YYYY-MM-DDTHH:MM:SS.###Z nsx-proxy[2357689]: NSX 2357689 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2357712" level="INFO"] StreamSocket[697 Open f:49 i:0 ? -> ssl://##.##.##.243:1235] on_connect ######-sslv3 alert certificate unknown (SSL routines, ssl3_read_bytes)

Environment

VMware NSX

Cause

The NSX Manager is currently rejecting the certificate, suggesting it may be invalid.

Resolution

  1. Backup the appliance configuration file: Back up the appliance-info.xml file by running the command: "mv /etc/vmware/nsx/appliance-info.xml /etc/vmware/nsx/appliance-info.xml.bak".
  2. Clean up the existing Transport Node certificate and private key, and then regenerate a new certificate. (Refer to KB 345825 for detailed instructions).
  3. Restart NSX services: Restart the following NSX services on the Transport Node:
    /etc/init.d/nsx-proxy restart
    /etc/init.d/nsx-opsagent restart
    /etc/init.d/nsx-cfgagent restart
    /etc/init.d/nsx-nestdb restart

    NOTE: If possible, it is highly recommended to place the host into Maintenance Mode before restarting the nsx-cfgagent and nsx-nestdb services.

  4. Push the newly generated Transport Node certificate to the NSX Manager using thumbprint. (Refer to KB 345825 for specific instructions).
  5. In the NSX Manager UI, click "Resolve" on the error reporting the host as disconnected.
  6. Once the host is successfully connected, a new appliance-info.xml file will have been pushed back to the host. After validating the connection, you may remove the backup file by running: "rm /etc/vmware/nsx/appliance-info.xml.bak"

Additional Information

Alarm For Transport Node Certificate Has Expired.

Connection between host and NSX Controller is UNKNOWN due to connection between host and NSX Manager is DOWN.