ESXi hosts marked as Not Responding following upgrade of vCenter Server to 8.0U2
search cancel

ESXi hosts marked as Not Responding following upgrade of vCenter Server to 8.0U2

book

Article ID: 318860

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Where you may encounter the below: 
  • ESXi hosts are marked as Not Responding and/or Disconnected following the upgrade of vCenter Server to 8.0U2
  • ESXi host version is earlier than 8.0U2
  • vCenter Server TRUSTED_ROOTS store contains non-CA certificate(s)
  • In the /var/log/vmware/vpxd/vpxd.log on the vCenter Server you see entries similar to the following:
    YYYY-MM-DDTHH:MM:SS.424-04:00 info vpxd[06067] [Originator@6876 sub=certmgrLogger opID=HB-host-12345@555-1234567a-WorkQueue-373fc90c] Will update root certificates on host; [vim.HostSystem:<Host FQDN>], on vc: (string) [ 
    YYYY-MM-DDTHH:MM:SS.457-04:00 info vpxd[07207] [Originator@6876 sub=vpxLro opID=HB-host-12345@555-1234567a-04] [VpxLRO] -- BEGIN lro-119514 -- -- AddClusterStoreMember --
    YYYY-MM-DDTHH:MM:SS.586-04:00 warning vpxd[06103] [Originator@6876 sub=vmomi.soapStub[1107] opID=HB-host-xxxx5@555-1234567a-DvsHandleHostReconnect-4b604375] SOAP request returned HTTP failure; <<io_obj p:0x00007fcc2c34f628, h:130, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/hostxxx/vpxa>, method: commitTransaction; code: 500(Internal Server Error) YYYY-MM-DDTHH:MM:SS.586-04:00 error vpxd[06103] [Originator@6876 sub=hostMethod opID=HB-host-xxxx5@555-1234567a-DvsHandleHostReconnect-4b604375] Commit call for method [applyDvs] transaction Id [159] failed on host [[vim.HostSystem:host-xxxx,<Host FQDN>]] with exception:[(vmodl.fault.HostCommunication) { --> faultCause = (vmodl.MethodFault) null, --> faultMessage = <unset> --> msg = "Received SOAP response fault from [<<io_obj p:0x00007fcc2c34f628, h:130, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-xxxx/vpxa>]: commitTransaction --> " --> }]
  • In the /var/run/log/vpxa/vpxa.log on the ESXi host you see entries similar to the following:
    YYYY-MM-DDTHH:MM:SS.584Z Er(163) Vpxa[6122473]: [Originator@6876 sub=Default opID=37f668e7] [VpxLRO] -- ERROR lro-80 -- 52839d49-b2ee-8cf2-70fc-3dfdc4ce24da -- networkSystem -- vim.host.NetworkSystem.commitTransaction: :vmodl.fault.HostCommunication 
    YYYY-MM-DDTHH:MM:SS.584Z Er(163) Vpxa[6122370]: --> Result:
    YYYY-MM-DDTHH:MM:SS.584Z Er(163) Vpxa[6122370]: --> (vmodl.fault.HostCommunication) {
    YYYY-MM-DDTHH:MM:SS.550Z In(166) Vpxa[6128218]: [Originator@6876 sub=vpxaInvtHost opID=WFU-3252723e] ServerId has been changed from 805152 to 0
    YYYY-MM-DDTHH:MM:SS.550Z Er(163) Vpxa[6128218]: [Originator@6876 sub=vpxaInvtHostCnx opID=WFU-3252723e] Can't connect to hostd. Shutting down...
    YYYY-MM-DDTHH:MM:SS.550Z In(166) Vpxa[6128218]: [Originator@6876 sub=Default opID=WFU-3252723e] [Vpxa] Shutting down now
  • In the /var/run/log/hostd.log on the ESXi host you see entries similar to the following:
    YYYY-MM-DDTHH:MM:SS.435Z In(166) Hostd[6040281]: [Originator@6876 sub=Libs opID=HB-host-12345@555-1234567a-02-18-df27 sid=52b461fa user=vpxuser:<no user>] info [ConfigStore:ee32fc6700] [cs:4:1947917405]Transaction committed,level = 1 YYYY-MM-DDTHH:MM:SS.435Z In(166) Hostd[6040278]: [Originator@6876 sub=Vimsvc.CertMgr opID=HB-host-12345@555-1234567a-WorkQueue-373fc90c-df29 sid=5216f1ca user=vpxuser] Discarding non-CA certificate: -----BEGIN CERTIFICATE-----

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSphere ESXi 7.x
VMware vCenter Server 8.0.2
VMware vSphere ESXi 8.0.0
VMware vSphere ESXi 8.0.1

Cause

  • vCenter Server pushes certificate updates to the ESXi host on reconnect after the upgrade.
  • If there is a non-CA certificate(s) in the TRUSTED_ROOTS certificate store, hostd discards this certificate and issues an ssl_reset which causes vpxa to restart on ESXi hosts prior to 8.0U2.
  • After the vpxa restart vCenter Server again pushes the certificate updates to the ESXi host on reconnect causing the same behavior to repeat and the host to disconnect from VC.

Resolution

  • The issue is resolved in the release vCenter Server 8.0 Update 2a Build 22617221
  • Release Notes 


Workaround:
Workaround 1 - preferred approach

  • Check if there are any non-CA certificates in the TRUSTED_ROOTS certificate store and remove those if they are no longer required.
  • You can run the following command on the vCenter Server Appliance to list each certificate alias and its key usage in the TRUSTED_ROOTS store, it should show Certificate Sign if is it a CA certificate:
    # /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text | egrep 'Alias|Key Usage' -A 1 | egrep -v 'Entry type|--'
  • To remove any non-CA certificates please follow the steps outlined in Removing Expired CA Certificates from the TRUSTED_ROOTS store in the VMware Endpoint Certificate Store(VECS)

Workaround 2 - Follow this approach only if there a requirement that the Certificate should not be removed 

  • Log on to the vSphere Client directly on the ESXi Host (using IP / FQDN) 
  • Click on Manage --> System --> Advance Settings 
  • Under Advance Settings filter out the option :- "Config.HostAgent.ssl.keyStore.allowAny"
  • Click on Edit Option and change the value to "true"
  • Click on Save button to save the settings  
  • Reconnecting the ESXi Host to the vCenter should fix the disconnect issue 
 



Additional Information

It is not recommended to revert the vCenter Server upgrade from this state as it introduces the possibility of encountering the vpxd crash outlined in the following KB: vpxd crash with error "duplicate key value violates unique constraint "pk_vpx_entity"" after reverting vCenter Server to a snapshot

Impact/Risks:

  • ESXi hosts are marked as disconnected or not responding and are unmanageable in vCenter Server