NSX-T upgrade fails at NSXT Host transport node Precheck on the SDDC manager
search cancel

NSX-T upgrade fails at NSXT Host transport node Precheck on the SDDC manager

book

Article ID: 389618

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

When performing an upgrade on the NSX -T manager the host transport nodes on a specific clusters might fails with EMM dry run for Enter Maintenance mode.

  • From the lcm-debug log, /var/log/vmware/vcf/lcm/lcm-debug.log following error can be observed. 

25-03-02T05:53:05.650+0000 ERROR [vcf_lcm,0000000000000000,0000,upgradeId=466a5347-b807-4039-901b-cf0cbc3c111c,resourceType=NSX_T_PARALLEL_CLUSTER,resourceId=nsxmanager_fqdn:_ParallelClusterUpgradeElement,bundleElementId=26a35a95-8ea8-4442-aca2-113674b0d783] [c.v.e.s.l.c.v.vsphere.VsphereUtils,Upgrade-2] Error during enter MAINTENANCE check. Generic error : RuntimeFault.summary
2025-03-02T05:53:05.656+0000 ERROR [vcf_lcm,0000000000000000,0000,upgradeId=466a5347-b807-4039-901b-cf0cbc3c111c,resourceType=NSX_T_PARALLEL_CLUSTER,resourceId=ssir-nsxt-workload-mgr.samsungds.net:_ParallelClusterUpgradeElement,bundleElementId=26a35a95-8ea8-4442-aca2-113674b0d783] [c.v.e.s.l.p.i.n.s.NsxtHostClusterParallelUpgradeStageRunner,Upgrade-2] Unable to connect to the vsphere Client when doing the EMM dry run precheck for hostUpgradeGroupId 1874bbfe-3ed5-47e4-a69b-4b72f4c8c2b0:domain-cx connect to the vsphere Client when doing the EMM dry run precheck for hostUpgradeGroupId 1874bbfe-3ed5-47e4-a69b-4b72f4c8c2b0:domain-cx 
java.lang.ClassCastException: class com.vmware.vim.binding.vmodl.RuntimeFault cannot be cast to class com.vmware.vim.binding.vmodl.MethodFault (com.vmware.vim.binding.vmodl.RuntimeFault and com.vmware.vim.binding.vmodl.MethodFault are in unnamed module of loader org.springframework.boot.loader.LaunchedURLClassLoader @1de0aca6)
at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.NsxtUpgradeUtil.getErrorMessagesForEnterMaintenanceDryrunExceptions(NsxtUpgradeUtil.java:406)
        at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.NsxtUpgradeUtil.checkDryRunEmmForCluster(NsxtUpgradeUtil.java:380)
        at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.service.NsxtHostClusterParallelUpgradeStageRunner.performDryRunCheck(NsxtHostClusterParallelUpgradeStageRunner.java:526)
        at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.service.NsxtHostClusterParallelUpgradeStageRunner.performHostTypeUpgradePrecheck(NsxtHostClusterParallelUpgradeStageR
at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.service.NsxtHostClusterParallelUpgradeStageRunner.doUpgradeStage(NsxtHostClusterParallelUpgradeStageRunner.java:113)
        at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.NsxtParallelClusterPrimitiveImpl.runUpgrade(NsxtParallelClusterPrimitiveImpl.java:571)
        at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.NsxtParallelClusterPrimitiveImpl.postUpgrade(NsxtParallelClusterPrimitiveImpl.java:215)
        at com.vmware.evo.sddc.lcm.orch.PrimitiveServiceImpl.postUpgradeAsync(PrimitiveServiceImpl.java:313)
        at com.vmware.evo.sddc.lcm.orch.PrimitiveServiceImpl.lambda$postUpgrade$0(PrimitiveServiceImpl.java:165)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
        

 

Environment

VCF 4.x 

VCF 5.x 

 

Cause

The issue occurs when when the host transport upgrade precheck is instated at the SDDC manager and fails stating that the hosts in the cluster cannot be placed on the maintenance mode. 

Some of the reasons for the failure of Entering MM by the hosts can be the following. 

  • Resource availability check. 
  • vCLs VMs status or cluster on retreat mode 
  • Dependent/Pined VM to specific hosts 
  • VM and host rules. conflicts.

 

Resolution

Identifying the issue and fixing the generic EMM dry run issue. 

  • When the failure occurs the SDDC manager send out a query to the vpxd which can be tracked with the following query parameter: vim.HostSystem.queryWhatIfEnterMaintenance 
  • When a specific host cannot enter MM the vpxd logs, /var/log/vmware/vpxd/vpxd.log reports in the error in the following manner: 

2025-03-02T07:12:16.476Z info vpxd[06588] [Originator@6876 sub=vpxLro opID=58123ce9] [VpxLRO] -- FINISH lro-3594258
2025-03-02T07:12:16.485Z info vpxd[07795] [Originator@6876 sub=vpxLro opID=1153d002] [VpxLRO] -- BEGIN lro-3594260 -- host-xx0 -- vim.HostSystem.queryWhatIfEnterMaintenance -- 523aba52-7a57-15cf-37b9-dfdc7486377d(5292dc1f-ea5f-86c9-a2fe-f92a9543f1d9)
2025-03-02T07:12:16.724Z info vpxd[07795] [Originator@6876 sub=VmCheck opID=1153d002] CompatCheck results: (vim.vm.check.Result) [
-->    (vim.vm.check.Result) {
-->       vm = 'vim.VirtualMachine:e237d04c-e8a0-47fe-ab24-d8a09f64170a:vm-xx0',
-->       host = 'vim.HostSystem:e237d04c-e8a0-47fe-ab24-d8a09f64170a:host-xx0',
-->       error = (vmodl.MethodFault) [
-->          (vmodl.RuntimeFault) {
-->             faultMessage = (vmodl.LocalizableMessage) [
-->                (vmodl.LocalizableMessage) {
-->                   key = "com.vmware.vim.vpxd.encryption.kmsClusterNotFound",
-->                   arg = (vmodl.KeyAnyValue) [
-->                      (vmodl.KeyAnyValue) {
-->                         key = "clusterName",
-->                         value = "TPM-Key"
-->                      }
-->                   ],
-->                }
-->             ],
-->             msg = ""
-->          }
-->       ],
-->    }
--> ]

  • Another example of another host with another error stack: 

-->
(vim.vm.check.Result) {
-->       vm = 'vim.VirtualMachine:e237d04c-e8a0-47fe-ab24-d8a09f64170a:vm-xx1',
-->       host = 'vim.HostSystem:e237d04c-e8a0-47fe-ab24-d8a09f64170a:host-xx1',
-->       warning = (vmodl.MethodFault) [
-->          (vim.fault.CannotAccessVmDevice) {
-->             device = "CD/DVD drive 1",
-->             backing = "[] /vmfs/volumes/vsan:0xxxxxxxxxxxxxxx-axxxxxxxxxxxxxx2/1xxxxxxxxxxxxxx-1yyy-5zzz-1xxxxxxx/DATASTORE.ISO",
-->             connected = false,
-->             msg = "",
-->          }
-->       ],
-->       error = (vmodl.MethodFault) [
-->          (vim.fault.CannotAccessNetwork) {
-->             device = "Network adapter 1",
-->             backing = "DVSwitch: 5x 3y 3z 84 da 0b 41 6x-4x 9x 3x 5x 28 7a ab dx",
-->             connected = false,
-->             msg = "",
-->          }
-->       ]
-->    }

 

  • In the above example, host-xx0, host-xx1 denotes the host's MOB id, you will be able to identify the host name on the vpxd, and in the same way, vm-xx0 and vm-xx1 denotes the vm-xx0 and vm-xx1 vm holding the host to enter MM and identify the same. 
  • Fix the above issue on the said VMs and retry the upgrade workflow from the SDDC manager. 
    • In this particular example, one of the VMs had invalid dvg port group and cd/dvd rom mounted cause the host to enter MM. And on another, vtpm was enabled with TPM key not available on the other hosts. This was worked around by shutting down the VM.