NSX Configuration in Edge or Host Transport Node is seen in Failed state after Certificate expiry
search cancel

NSX Configuration in Edge or Host Transport Node is seen in Failed state after Certificate expiry

book

Article ID: 411692

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The Edge or ESXI Host Transport Node is seen in a failed state as below:

  • Manager and controller connectivity is not established even with reachability between the Transport Node and the Managers (for Edge nodes the reason may also be reported as OTHER_ERROR):
    nsxcli -c get managers
    Wed Sep 24 2025 UTC 03:05:44.948
    - 10.#.#.21    Standby (NSX-RPC)
    - 10.#.#.22    Standby (NSX-RPC)
    - 10.#.#.23    Standby (NSX-RPC) *

    nsxcli -c get controllers
    Wed Sep 24 2025 UTC 03:05:55.941
     Controller IP    Port     SSL         Status       Is Physical Master   Session State  Controller FQDN           Failure Reason
     10.#.#.23    1235   enabled    disconnected           true              down              NA                       NA
     10.#.#.22    1235   enabled      not used            false              null              NA                       NA
     10.#.#.21    1235   enabled      not used            false              null              NA                       NA
  • Messages similar to the following are seen in the ESX i var/run/log/nsx-syslog.log file
    Wa(180) nsx-proxy[2101991]: NSX 21####1 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2102015" level="WARNING"] Certificate validation: couldn't find SHA256 digest '####################################' in local trust store
    Er(179) nsx-proxy[2101991]: NSX 21####1 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2102015" level="ERROR" errorCode="NET1111"] Certificate validation failed: 18-self signed certificate

    In(182) nsx-opsagent[2102367]: NSX 21###67 - [nsx@6876 comp="nsx-esx" subcomp="mpa-client" tid="21###30" level="INFO"] [AlarmsProvider] MsgHandler : Invalid stub for Master APH
    In(182) nsx-opsagent[2102367]: NSX 21###67 - [nsx@6876 comp="nsx-esx" subcomp="mpa-client" tid="21###30" level="INFO"] [AlarmsProvider] SendRequest: Failed to send msg Master APH, Publish, type (com.vmware.nsx.monitoring.CollectorMpMsg), correlationId (), trackingIdStr (#######-####-####-3fa3-########a1e0), ret (-1)
  • Messages similar to the following are seen in the NSX Manager /var/log/syslog file
    Manager01 NSX 99086 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] getStub: client ########-####-####-####-############, application HealthCheck, java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Channel ClientChannel(vmware.nsx.healthcheck.HealthCheckHostService, ########-####-####-####-###########) is closed before stream was opened due to Status(code=UNKNOWN, msg=Closed by remote service)
    2025-09-24T02:50:00.327Z Manager01 NSX 99086 MONITORING [nsx@6876 comp="nsx-manager" errorCode="MP150008" level="ERROR" subcomp="manager"] Error in sending requestMsg to transportNode:########-####-####-####-#############, requestId(roundId): left: #######################right: #####################, errInfo:Unable to reach client ########-####-####-####-###########, application HealthCheck

Environment

VMware NSX

Cause

Certificate Validation between the Transport Node and Managers failed due to Expired Certificates

Resolution

  1. Check the certificate host-cert.pem.
    cd /etc/vmware/nsx/
    ls
    Sample output:
    appliance-info.xml   host-cert.pem  host-privkey.pem  netopa.xml     openssl-proxy.cnf
    controller-info.xml  host-cfg.xml   mpa-txn           nsx-proxy.xml
  2. Copy the original file to backup
    cp host-cert.pem host-cert.pem.bak
  3. Delete the original pem file.
    rm host-cert.pem
  4. Restart proxy service - this should recreate the new host-cert.pem file.
    /etc/init.d/nsx-proxy restart
  5. Verify the new pem file and check the validity
    openssl x509 -startdate -enddate -noout -in /etc/vmware/nsx/host-cert.pem
  6.  Do a manual resync of certificates using the commands below for all three manager nodes (OR refer work around in KB: 389595
    push host-certificate <manager-IP-FQDN> username <username> thumbprint <cert-api-thumbprint-of-manager> password <password>
    sync-aph-certificates <manager-IP-FQDN> username <username> thumbprint <cert-api-thumbprint-of-manager> password <password>
  7. Check the Transport Nodes status in the NSX UI

Additional Information

To prevent this condition from triggering, it is recommended to act upon TN Certificate Expiry alarms rapidly: KB 345825

A similar condition can also be caused by an incorrect FQDN configured on the TN for the NSX Managers: KB 404627