Installing NSX on a transport Node fails at 48% with "Failed to install software on host. Failed to install software on host. Time out waiting for host to join NSX Manager."
search cancel

Installing NSX on a transport Node fails at 48% with "Failed to install software on host. Failed to install software on host. Time out waiting for host to join NSX Manager."

book

Article ID: 323539

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • NSX-T 4.0.0 to 4.0.1
  • Adding a new Manager or Host transport node fails.
  • On the manager node failing to join the cluster:
/var/log/syslog
2022-11-23T03:43:08.545Z <nsx-manager-hostname> NSX 12182 - [nsx@6876 comp="nsx-manager" subcomp="cli" username="admin" level="INFO"] {10000} CMD: join <manager-IP> cluster-id ee82b2c5-2d91-45d3-9789-95494624c774 thumbprint 216XXXXXXXXXXXXXX token <token-obfuscated> force
2022-11-23T03:43:08.546Z <nsx-manager-hostname> NSX 12182 - [nsx@6876 comp="nsx-manager" subcomp="cli" username="admin" level="WARNING"] Unable to determine terminal size: [OSError] [Errno 25] Inappropriate ioctl for device
2022-11-23T03:43:08.571Z <nsx-manager-hostname> NSX 11662 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Received event: CoordinationEvent[type=MEMBER_ADDED, source=5ecb4de3-03b9-4a3a-ace7-1a66fff7948d]
2022-11-23T03:43:08.572Z <nsx-manager-hostname> NSX 11662 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Ignoring event MEMBER_ADDED from source CoordinationEvent[type=MEMBER_ADDED, source=5ecb4de3-03b9-4a3a-ace7-1a66fff7948d].

o: [CBM125] SSL exception when making an attach call to the destination. Please check the thumbprint.
2022-11-23T03:43:13.618Z <nsx-manager-hostname> NSX 12182 - [nsx@6876 comp="nsx-manager" subcomp="cli" username="admin" level="ERROR" errorCode="('CLI110',)"] Error processing join request, status: 500, obj: {'error_code': 36752, 'error_message': 'Operation failed. Reason: [CBM125] SSL exception when making an attach call to the destination. Please check the thumbprint.', 'module_name': 'node-services'}, err: None
2022-11-23T03:43:13.620Z <nsx-manager-hostname> NSX 12182 - [nsx@6876 comp="nsx-manager" subcomp="cli" username="admin" level="WARNING"] An error occurred while joining the specified cluster. Reason: [CBM125] SSL exception when making an attach call to the destination. Please check the thumbprint.
2022-11-23T03:43:13.621Z <nsx-manager-hostname> NSX 12182 - [nsx@6876 comp="nsx-manager" subcomp="cli" username="admin" level="INFO" audit="true"] CMD: join <manager-IP> cluster-id ee8XXX-XXX-45d3-9789-954XXXX thumbprint 216XXXXXXXXXXXXXXXXX token <token-obfuscated> force (duration: 5.074s), Operation status: CMD_EXECUTED_WITH_ERROR_RESULT
2022-11-23T03:43:13.621Z <nsx-manager-hostname> NSX 12182 - [nsx@6876 comp="nsx-manager" subcomp="cli" username="admin" level="INFO"] NSX CLI stopped for user: admin
 
on the NSX manager node we also see the below error messages
 
/var/log/syslog
           

2025-03-10T16:43:14.090Z nsxtmgr01.indra.es NSX 90782 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="man

ager"] getClientHeartbeatStatus: client 3ad1b46d-3b73-40f6-965f-82005ea82af6, no record for heartbeat found.

2025-03-10T16:43:14.091Z nsxtmgr01.com NSX 90782 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="man

ager"] getClientHeartbeatStatus: client 3ad1b46d-3b73-40f6-965f-82005ea82af6, no record for heartbeat found.

2025-03-10T16:43:14.293Z nsxtmgr01.com NSX 90782 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="man

ager"] getClientHeartbeatStatus: client cc2739b9-XXX-4ec4-XXX-XXXXX, no record for heartbeat found.

2025-03-10T16:43:14.293Z nsxtmgr01.com NSX 90782 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP26019" level="E

RROR" subcomp="manager"] Time out waiting for host to join NSX Manager.

2025-03-10T16:43:14.294Z nsxtmgr01.com NSX 90782 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP26050" level="E

RROR" subcomp="manager"] Host prep failed for cc2739b9-XXX-4ec4-XXX-XXXXX.

2025-03-10T16:43:14.306Z nsxtmgr01.com NSX 90782 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP26050" level="E

RROR" subcomp="manager"] Failed to execute a deploy unit instance: DeploymentUnitInstance/66cf2033-03b7-4269-b5b3-94006a2afcf3

. ErrorType: [null]. Details: null

2025-03-10T16:43:14.309Z nsxtmgr01.com NSX 90782 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manage

r"] Updating the deploymentProgressState for deploymentUnitInstance: DeploymentUnitInstance [ id=DeploymentUnitInstance/66cf20

33-03b7-4269-b5b3-94006a2afcf3, deploymentUnitId=DeploymentUnit/c17f86e0-10f7-48f8-90ff-b1b7cf493f2a, hostId=HostTransportNode

/cc2739b9-XXX-4ec4-XXX-XXXXX, entityId=null, prevEntityId=null, runningVersion=null, deploymentProgressState=INSTALL_

FAILED, deploymentGoalState=ENABLED, internalLastKnownOSVersion=8.0.3, agentId=null, errorId=26050, errorMessage=Failed to ins

tall software on host. Time out waiting for host to join NSX Manager.] to INSTALL_FAILED:Failed to install software o host. Time out waiting for host to join NSX Manager

 
 
             
  • On the ESXi host failing to join NSX-T cluster:
    • In the UI: "Failed to install software on host. Failed to install software on host. Time out waiting for host to join NSX Manager."
      /var/log/nsxcli.log
      2023-03-03T11:16:25.360Z 2186011 cli.server.cli_command_service INFO {0} CMD: join management-plane <manager-IP> thumbprint 489487579181ca59a583eb65b6eae14f92c2308bf6f877544a98f80cef5d1229 token <token-obfuscated> node-uuid e4dac940-6afc-46bc-9135-21f2676737d85
      2023-03-03T11:16:25.362Z 2186011 cli.utils.render_utils WARNING Unable to determine terminal size: [OSError] [Errno 25] Inappropriate ioctl for device
      2023-03-03T11:16:26.017Z 2186011 cli.commands.host_shared.register INFO version 7.0.3 buildnum 19193900
      2023-03-03T11:16:26.019Z 2186011 cli.commands.host_shared.register INFO Tokenfile is not given
      2023-03-03T11:16:26.021Z 2186011 vmware.runcommand INFO runcommand called with: args = '['/usr/bin/openssl', 'x509', '-in', '/etc/vmware/nsx/host-cert.pem', '-subject', '-noout']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
      2023-03-03T11:16:26.040Z 2186011 vmware.runcommand INFO runcommand called with: args = '['/usr/bin/openssl', 'req', '-new', '-newkey', 'rsa:2048', '-days', '3650', '-nodes', '-x509', '-keyout', '/tmp/tmpki5pqek1', '-out', '/tmp/tmpyr9x091b', '-config', '/tmp/tmpwaoivm0c', '-extensions', 'req_ext']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
      2023-03-03T11:16:29.099Z 2186011 cli.utils.apiclient ERROR POST /api/v1/fabric/nodes/e4dac940-6afc-46bc-9135-21f267626d85?action=register_node raised exception: <class 'ssl.SSLEOFError'>
      Traceback (most recent call last):
        File "/opt/vmware/nsx-cli/bin/python/cli/utils/apiclient.py", line 87, in request
          conn.connect()
        File "/lib64/python3.8/http/client.py", line 1427, in connect
        File "/lib64/python3.8/ssl.py", line 500, in wrap_socket
        File "/lib64/python3.8/ssl.py", line 1040, in _create
        File "/lib64/python3.8/ssl.py", line 1309, in do_handshake
      ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1125)
      2023-03-03T11:16:29.103Z 2186011 cli.commands.host_shared.register INFO Stopping nsx-proxy
      2023-03-03T11:16:29.104Z 2186011 vmware.runcommand INFO runcommand called with: args = '['/etc/init.d/nsx-proxy', 'stop']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
      2023-03-03T11:16:31.109Z 2186011 cli.commands.host_shared.register INFO Starting nsx-proxy
      2023-03-03T11:16:31.111Z 2186011 vmware.runcommand INFO runcommand called with: args = '['/etc/init.d/nsx-proxy', 'start']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
      2023-03-03T11:16:36.248Z 2186011 cli.server.cli_command_service WARNING Exception when registering host: 'Unable to connect to the API service'
      2023-03-03T11:16:36.252Z 2186011 cli.audit INFO CMD: join management-plane <manager-IP> thumbprint 489487579181ca59a583eb65b6eae14f92c2308bf6f877544a98f80cef5d1229 token <token-obfuscated> node-uuid e4dac940-6afc-46bc-9135-21f267737d85 (duration: 10.890s), Operation status: CMD_EXECUTED_WITH_ERROR_RESULT
      2023-03-03T11:16:36.253Z 2186011 cli INFO NSX CLI stopped for user: root
          
  • The thumbprint being used to join the cluster is different to the thumbprint returned by the existing manager node
On existing node:
<nsx-manager-hostname>> get certificate api thumbprint
b7006bb89be6bcb1b3f6a66816235ff24d06109fabcde590e40b6347d9fec9d4

On Manager/Host node trying to join cluster:
# /var/log/syslog
2022-11-23T03:43:08.545Z <nsx-manager-hostname> NSX 12182 - [nsx@6876 comp="nsx-manager" subcomp="cli" username="admin" level="INFO"] {10000} CMD: join <manager-IP> cluster-id ee82b2c5-2d91-45d3-9789-95494624c774 thumbprint 2165e3cfb8d593b25986653dac372fc18489ab746a5f57eac6b8d61d0b719fc1 token <token-obfuscated> force



Environment

VMware NSX 4.0.0.1
VMware NSX 4.2.X.X

Cause

  • This is caused by the certificates PEM encoding containing '\r\n' linebreaks used in DOS/Windows, making it unreadable to certain NSX services.

Resolution

  • This is resolved in NSX-T 4.1.1.
    1. Test the connection using the commands below to verify if the connection is established:

       
      nc -zv <manager> 1234 nc -zv <manager> 1235
    2. Check for any established connections with the following command:

       
      esxcli network ip connection list | grep 123[4-5]
       
      Make sure The ports should be  open to the virtual IP of the Manager, and they also needed to be open to the 3 individual NSX Managers.


Workaround:

 

If you believe you have encountered this issue, please open a support case with Broadcom Support and refer to this KB article.

For more information, see Creating and managing Broadcom support cases.