Deployment of ESXi host transport node fails
search cancel

Deployment of ESXi host transport node fails

book

Article ID: 372580

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Adding a new ESXi Host transport node fails at 48%
  • The nsxcli.log on the ESXi host under /var/log has the following entries while attempting to install:

    2024-06-28T13:19:39.053Z 2276326 cli INFO NSX CLI started (ESX) for user: root
    2024-06-28T13:19:39.157Z 2276326 cli.server.cli_command_service INFO {0} CMD: join management-plane ##.##.##.## thumbprint ########################8ef146cd8c59032bb02708f8126e2abcd token <token-obfuscated> node-uuid ########-####-####-####-############
    2024-06-28T13:19:39.158Z 2276326 cli.utils.render_utils WARNING Unable to determine terminal size: [OSError] [Errno 25] Inappropriate ioctl for device
    2024-06-28T13:19:39.763Z 2276326 cli.commands.host_shared.register INFO version 7.0.3 buildnum 23794027
    2024-06-28T13:19:39.765Z 2276326 cli.commands.host_shared.register INFO Tokenfile is not given
    2024-06-28T13:19:39.767Z 2276326 vmware.runcommand INFO runcommand called with: args = '['/usr/bin/openssl', 'x509', '-in', '/etc/vmware/nsx/host-cert.pem', '-subject', '-noout']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
    2024-06-28T13:19:39.786Z 2276326 vmware.runcommand INFO runcommand called with: args = '['/usr/bin/openssl', 'req', '-new', '-newkey', 'rsa:2048', '-days', '3650', '-nodes', '-x509', '-keyout', '/tmp/tmpr3irh9h0', '-out', '/tmp/tmpykxrxcxe', '-config', '/tmp/tmpbc7k86qw', '-extensions', 'req_ext']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
    2024-06-28T13:19:41.628Z 2276326 cli.utils.apiclient ERROR POST /api/v1/fabric/nodes/########-####-####-####-############?action=register_node raised exception: <class 'ssl.SSLEOFError'>
    Traceback (most recent call last):
      File "/opt/vmware/nsx-cli/bin/python/cli/utils/apiclient.py", line 87, in request
        conn.connect()
      File "/lib64/python3.8/http/client.py", line 1428, in connect
      File "/lib64/python3.8/ssl.py", line 500, in wrap_socket
      File "/lib64/python3.8/ssl.py", line 1073, in _create
      File "/lib64/python3.8/ssl.py", line 1342, in do_handshake
    ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1131)   
  • The NSX API certificate thumbprint is unable to be retrieved from the NSX Manager
On existing node:

> get certificate api thumbprint
% An error occurred while reading the API server certificate
 
  • The NSX Manager /var/log/syslog contains similar messages.

2024-07-08T16:26:31.558Z  INFO task-executor-2 DeploymentProgressServiceImpl 4812 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Updating the DeploymentProgress from: DeploymentProgress [ id=########-####-####-####-############, deploymentType=HOST_TN, operationType=INSTALL, progress=48, stateDescription=deployment.progress.fn.wait_for_mp_mpa_conn, removeNsxFlag=false] to DeploymentProgress [ id=########-####-####-####-############, deploymentType=HOST_TN, operationType=INSTALL, progress=0, stateDescription=deployment.progress.fn.start, removeNsxFlag=false]

...
2024-07-08T16:26:31.431Z  INFO http-nio-127.0.0.1-7440-exec-117 TransportNodeCollectionServiceImpl 4812 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="afc6ea06-####-####-####-bb2fbd0281db" subcomp="manager" username="admin"] TN status is pending and FN status is INSTALL_FAILED for TN ########-####-####-####-############
2024-07-08T16:26:31.431Z  INFO http-nio-127.0.0.1-7440-exec-117 TransportNodeCollectionServiceImpl 4812 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="afc6ea06-aa85-4ec7-920d-bb2fbd0281db" subcomp="manager" username="admin"] FNState is INSTALL_FAILED for TnId ########-####-####-####-############ so returning TNC state as failed for TNC
2024-07-08T16:30:30.763Z  INFO ActivityWorkerPool-1-8 DeploymentUnitInstanceServiceImpl 4812 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Updating the deploymentProgressState for deploymentUnitInstance: DeploymentUnitInstance [ id=DeploymentUnitInstance/25de23ca-####-####-####-2076537c9599, deploymentUnitId=DeploymentUnit/4e8d47f4-####-####-####-8fa4224d64c3, hostId=HostTransportNode/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee, entityId=null, prevEntityId=null, runningVersion=null, deploymentProgressState=INSTALL_FAILED, deploymentGoalState=ENABLED, internalLastKnownOSVersion=7.0.3, agentId=null, errorId=26050, errorMessage=Failed to install software on host. Time out waiting for host to join NSX Manager.] to INSTALL_FAILED:Failed to install software on host. Time out waiting for host to join NSX Manager.
duInstance DeploymentUnitInstance [ id=DeploymentUnitInstance/25de23ca-####-####-####-2076537c9599, deploymentUnitId=DeploymentUnit/4e8d47f4-####-####-####-8fa4224d64c3, hostId=HostTransportNode/########-####-####-####-############, entityId=null, prevEntityId=null, runningVersion=null, deploymentProgressState=INSTALL_FAILED, deploymentGoalState=ENABLED, internalLastKnownOSVersion=7.0.3, agentId=null, errorId=26050, errorMessage=Failed to install software on host. Time out waiting for host to join NSX Manager.]

Environment

VMware NSX 4.0.0.x
VMware NSX 4.1.0.x

Resolution

This issue is resolved in VMware NSX 4.1.1

Additional Information

If you believe you have encountered this issue and cannot upgrade to VMware NSX 4.1.1, please open a support case with Broadcom Support and refer to this KB article.

For more information, see Creating and managing Broadcom support cases.