Host Transport Nodes in Install Failed state with error "Failed to send HostConfig RPC to MPA" and "Certificate validation failed: 18-self signed certificate"
search cancel

Host Transport Nodes in Install Failed state with error "Failed to send HostConfig RPC to MPA" and "Certificate validation failed: 18-self signed certificate"

book

Article ID: 400883

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • ESXi hosts are in "Failed/Host Disconnected" status with below errors:

    "Host configuration: Failed to send the HostConfig message.
     [TN=TransportNode/<uuid>]. Reason: Failed to send HostConfig RPC to MPA TN:<uuid>. Error: Unable to reach client <tn-uuid>, application SwitchingVertical."

  • In some cases, after running the replace_certs.py script, the NSX Transport Nodes are disconnected from the NSX managers.
  • The certificates have been successfully replaced and no expired certs noticed.
  • get managers command shows connections to the other Manager nodes as standby.
    ESXi01> get managers
    Thu Jun 12 2025 UTC 16:12:06.357
    - <NSX-MGR1>     Connected (NSX-RPC) *
    - <NSX-MGR2>     Standby (NSX-RPC)
    - <NSX-MGR3>     Connected (NSX-RPC)
  • Un prepare and reprepare, or Reboot of TN doesn't fix the issue.
  • You may see similar entries in the following logs:

/var/log/syslog* 

2025-06-08T14:47:47.602Z In(182) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2100175" level="INFO"] ConnectionKeeper[6 ssl://NSX-MGR2:1234] attempting connection from timer callback
2025-06-08T14:47:47.602Z In(182) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2100175" level="INFO"] StreamSocket[157416 Init f:-1 i:-1 ? -> ssl://NSX-MGR2:1234] Created
2025-06-08T14:47:47.603Z In(182) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2100175" level="INFO"] RpcConnection[157416 Init to ssl://NSX-MGR2:1234 0] Queue threshold size 0
2025-06-08T14:47:47.603Z In(182) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2100175" level="INFO"] StreamSocket[157416 Open f:54 i:0 ? -> ssl://NSX-MGR2:1234] async_connect
2025-06-08T14:47:47.617Z Wa(180) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2100175" level="WARNING"] Certificate validation: couldn't find SHA256 digest <SHA256 digest of the APH-TN certificate of another Manager node> in local trust store

2025-06-08T14:47:47.617Z In(182) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2100175" level="INFO"] StreamSocket[157416 Open f:54 i:0 ? -> ssl://NSX-MGR2:1234] on_connect 336134278-certificate verify failed
2025-06-08T14:47:47.617Z Wa(180) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2100175" level="WARNING"] StreamConnection[157416 Connecting to ssl://NSX-MGR2:1234 sid:157416] Couldn't connect to 'ssl://NSX-MGR2:1234' (error: 336134278-certificate verify failed)
2025-06-08T14:47:47.617Z Wa(180) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2100175" level="WARNING"] StreamConnection[157416 Error to ssl://NSX-MGR2:1234 sid:-1] Error 336134278-certificate verify failed
2025-06-08T14:47:47.617Z Wa(180) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2100175" level="WARNING"] RpcConnection[157416 Connecting to ssl://NSX-MGR2:1234 0] Couldn't connect to ssl://NSX-MGR2:1234 (error: 336134278-certificate verify failed)
2025-06-08T14:47:47.618Z Wa(180) nsx-proxy[2100147]: NSX 2100147 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2100175" level="WARNING"] RpcTransport[0] Unable to connect to ssl://NSX-MGR2:1234: 336134278-certificate verify failed


2025-06-10T19:33:52.794Z Wa(180) nsx-proxy[8827652]: NSX 8827652 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="8827674" level="WARNING"] Certificate validation: couldn't find SHA256 digest <SHA256 digest of the APH-TN certificate of another Manager node>' in local trust store
2025-06-10T19:33:52.794Z Er(179) nsx-proxy[8827652]: NSX 8827652 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="8827674" level="ERROR" errorCode="NET1111"] Certificate validation failed: 18-self signed certificate
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]: Certificate:
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:     Data:
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:         Version: 3 (0x2)
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:         Serial Number:
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:             <Serial number of the certificate>
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:     Signature Algorithm: sha256WithRSAEncryption
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:         Issuer: C=US; ST=California; L=Palo Alto; O=VMware, Inc.; [email protected]; CN=VMware-NSX-ApplProxyHub; UID=4b######-####-4142-####-########cec1
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:         Validity
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:             Not Before: Mar 28 22:03:32 2024 GMT
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:             Not After : Mar  4 22:03:32 2124 GMT
2025-06-10T19:33:52.794Z Er(179)[+] nsx-proxy[8827652]:         Subject: C=US; ST=California; L=Palo Alto; O=VMware, Inc.; [email protected]; CN=VMware-NSX-ApplProxyHub; UID=4b######-####-4142-####-########cec1

 

Environment

VMware NSX 

Resolution

Workaround is to create a new APH-TN certificate and replace against the affected Manager Node.

Note: Affected Manager Node IP address can be observed in the logs and also via get managers where the affected manager node will be on standby.

1. Generate a new self-signed certificate from NSX UI
2. Note down the impacted node UUID and run the following API call to apply new certificate

    POST api/v1/trust-management/certificates/<certificate_id>?action=apply_certificate&service_type=APH_TN&node_id=<manager_node_id>

3. re-sync / click on failed node and resolve should fix the issue.

Also, running get managers on TN should give us all 3 managers in connected state as expected

ESXi01> get managers
Thu Jun 12 2025 UTC 16:23:13.256
- <NSX-MGR1>     Connected (NSX-RPC) *
- <NSX-MGR2>     Connected (NSX-RPC)
- <NSX-MGR3>     Connected (NSX-RPC)

Additional Information