An internal error occurred while generating the compliance for 'domain-cxxxx'. Retry compliance check
search cancel

An internal error occurred while generating the compliance for 'domain-cxxxx'. Retry compliance check

book

Article ID: 412536

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Navigating to a vSAN cluster, the updates tab is populated with the error - An internal error occurred while generating the compliance for 'domain-cxxxx'. Retry compliance check

  • Triggering a compliance check manually fails with the same error.

  • In the vCenter logs we are seeing the following error:

    vmware-updategmr/vmware-vum-server.log:

    yyyy-mm-ddThh:mm:ss info vmware-vum-server[2569116] [Originator@6876 sub=vmomi.soapStub[12] opID=1c941e96-7a7b-4f48-9c90-a6e7f8eac458] SOAP request returned HTTP failure; <SSL(<io_obj p:0x00007fa2dc026c60, h:49, <TCP '127.0.0.1 : 54400'>, <TCP '127.0.0.1 : 443'>>), /vsanHealth>, method: runLifecycleCheck; code: 500(Internal Server Error); fault: (vim.fault.VsanFault) 

    -->    faultCause = (vmodl.MethodFault) null, 
    -->    faultMessage = (vmodl.LocalizableMessage) [ 
    -->       (vmodl.LocalizableMessage) { 
    -->          key = "com.vmware.vsan.clustermgmt.lifecycle.msg.internalerror", 
    -->          arg = <unset>, 
    -->          message = "Lifecycle management operation failed due to unexpected error." 
    -->       } 
    -->    ] 
    -->    msg = "Received SOAP response fault from [<SSL(<io_obj p:0x00007fa2dc026c60, h:49, <TCP '127.0.0.1 : 54400'>, <TCP '127.0.0.1 : 443'>>), /vsanHealth>]: runLifecycleCheck 
    --> General vSAN error." 
    --> } 
     yyyy-mm-ddThh:mm:ss error vmware-vum-server[2569116] [Originator@6876 sub=VumVapi::Lib::Utils opID=1c941e96-7a7b-4f48-9c90-a6e7f8eac458] [Utils 1105] VsanFault exception hit while running LifecycleCheck: Fault cause: vim.fault.VsanFault

  • The fault cause is vim.fault.VsanFault. Further looking at vSAN logs you see the below error:

    var/log/vmware/vsan-health/vmware-vsan-health-service.log:

    yyyy-mm-ddThh:mm:ss  INFO vsan-mgmt[36593] [VsanEventUtil::_collectClustersEventsFromCache opID=noOpId] skip cluster 'vim.ClusterComputeResource:domain-cxxxx' without updated timestamp : 
    yyyy-mm-ddThh:mm:ss  INFO vsan-mgmt[36072] [VsanVcClusterConfigSystemImpl::RunLifecycleCheck opID=184c9593] Running lifecycle check on cluster 'vim.ClusterComputeResource:domain-cxxxx' with spec: (vim.cluster.VsanVcLifecycleCheckSpec) { operation = 'noChecks' } 

    yyyy-mm-ddThh:mm:ss  ERROR vsan-mgmt[36072] [VsanVcClusterConfigSystemImpl::RunLifecycleCheck opID=184c9593] Lifecycle checks internal error (vmodl.fault.ManagedObjectNotFound) { 
      msg = "Received SOAP response fault from [<<cs p:00007fc5ec278c40, TCP:localhost:8085>, /sdk>]: GetHardware\nThe object 'vim.HostSystem:host-xxx' has already been deleted or has not been completely created"
      obj = 'vim.HostSystem:host-xxx' 

    Traceback (most recent call last): 
      File "bora/vsan/clusterconfig/vpxd/pyMoVsan/VsanVcClusterConfigSystemImpl.py", line 11222, in RunLifecycleCheck 
      File "bora/vsan/clusterconfig/vpxd/pyMoVsan/VsanVcClusterConfigSystemImpl.py", line 11164, in IsWitnessVirtualAppliance 
      File "/usr/lib/vmware/site-packages/pyVmomi/VmomiSupport.py", line 612, in __call__ 
      File "/usr/lib/vmware/site-packages/pyVmomi/VmomiSupport.py", line 400, in _InvokeAccessor 
    PyCppVmomi.vmodl.fault.ManagedObjectNotFound: (vmodl.fault.ManagedObjectNotFound) { 
      msg = "Received SOAP response fault from: GetHardware\nThe object 'vim.HostSystem:host-xxx' has already been deleted or has not been completely created", 
      obj = 'vim.HostSystem:host-xxx' 

Environment

VMware vCenter Server 8.x

Cause

Stale entry of a VSAN witness host on the vCenter server. 
The compliance check is encountering an error in the vSAN health servce as it is trying to find host host-xxx, but it is reported that the host has already been deleted or has not been completely created

Resolution

Follow the below steps to manually remove the stale entry of the vSAN witness host from the vCenter server MOB page.

Take a snapshot of the vCenter server before making the changes.

1. Open an SSH session to the vCenter appliance
2. Log into RVC with "rvc localhost" and navigate to the affected vSAN cluster on your environment. 


3. Enable the vSAN MOB, disabled by default, by running the command:
vsan.debug.mob --start /localhost

4. Access the URL link from the output. Log in with [email protected] (or the configured domain name) and password. The link will be similar to:
https://vcenter_fqdn/vsan/mob 


5. Click on (more...) then select vsan-stretched-cluster-system.

After this step, a new window is launched for the vsan-stretched-cluster-system.


6. Click on VSANVcGetWitnessHosts.
You should be seeing the nodes listed in VimClusterVSANWitnessHostInfo[]. Find the host with the ID referenced in the logs host-xxx (The object 'vim.HostSystem:host-xxx' has already been deleted or has not been completely created). This is the host entry that we will be removing. 

7. Before we proceed to remove the stale host, validate the current configured Witness vSAN witness node from the vSphere UI. 
Navigate to Cluster > Configure > vSAN > Fault Domain. 

 

8. Validate the host entry is not present in the vCenter Database.

psql -d VCDB -U postgres -c "select id, dns_name, ip_address from vpx_host where id='xxx';" 

9. Go back to vsan-stretched-cluster-system page and click on VSANVcRemoveWitnessHost. Enter the <domain-c> and <host-> values , from the above step into their respected sections and Click on Invoke Method. 


10. To confirm the host entry successfully removed, navigate back to VSANVcGetWitnessHosts (step 6). Once confirmed, go back to vSphere UI, refresh the browser and re-try the compliance check. 

 

 

Additional Information

Steps are taken and modified from KB Out of inventory witness alerts seen in vSAN cluster Fault Domains . Steps 2-7, and step 11. Note: Step 1 to "Place the witness appliance into maintenance mode" can be skipped as we are removing a stale entry in this scenario.