SDDC Manager Pre-checks are failing with an ERROR
search cancel

SDDC Manager Pre-checks are failing with an ERROR

book

Article ID: 411245

calendar_today

Updated On:

Products

VMware SDDC Manager VMware Cloud Foundation

Issue/Introduction

  • SDDC Manager's workload domain, Cluster, and hosts are in an ERROR state,  resulting in a failed pre-check for the upgrade. 
  • Error in /var/log/vmware/vcf/lcm/lcm-debug.log
    DEBUG [vcf_lcm,68cc###################396,e9cb] [c.v.e.s.l.a.r.c.i.u.InventoryUpgradeController,http-nio-127.0.0.1-7400-exec-8] In InventoryUpgradeController, get all VMware Software inventory upgrades ####-##-##T##:##:##77+0000 DEBUG [vcf_lcm,68cc#####################0396,e9cb] [c.v.e.s.l.a.i.i.InventoryClientImpl,http-nio-127.0.0.1-7400-exec-8] Failed Resources Map: {ESX_HOST:03c###29-####-####-####-8a1#####1ba=17########6982, ESX_HOST:2#####e7-####-#####-#####-ff0#######3=175##########2, ESX_HOST:55####07-####-####-####-a########53=17#######82, ESX_HOST:b4#####98-##2b-####-####-3c#######35b=17######982, ESX_HOST:21#####b-####-####-####-31#######dd=1#######982, VCENTER:d9f7376c-1e35-4979-####-b#########ad=17###############6982}
    
    DEBUG [vcf_lcm,68cc#################96,e9cb] [c.v.e.s.l.a.i.i.InventoryClientImpl,http-nio-127.0.0.1-7400-exec-8] Failed Resources Map: {ESX_HOST:03c######9-####-####-####-8#######1ba=175#######982, ESX_HOST:2#####6e7-ebf6-####-####-ff########3563=175###########82, ESX_HOST:55513d07--####-####-a5######953=175###########982, ESX_HOST:b4####98-##2b-####-ab####-#####5b=175#######6982, ESX_HOST:2#####b-####-####-####-31######dd=17#######82, VCENTER:d9f7376c-####-####-####-b#####ad=175#########82}
    
    WARN  [vcf_lcm,68cc####################396,e9cb] [c.v.e.s.l.s.i.InventoryUpgradeServiceImpl,http-nio-127.0.0.1-7400-exec-8] Failed domain type: VI id: 5f#####4-####-####-####-90e#######6 failed items: UpgradeItem(id=d9#####6c-####-####-####-b#######d, type=VCENTER, parentId=null, parentType=null, sequenceNumber=1, isUserInputRequired=false)

     

     

  • Error in /var/log/vmware/vcf/domainmanager/domainmanager.log

    DEBUG [vcf_dm,68cc##############################5222,e62c] [c.v.v.c.service.ResourceCacheService,dm-exec-2]  Domain 5#######64-####-####-####-9###########166 is in ERROR status and it's not ACTIVE. Skip adding this resource to the inventory.
    DEBUG [vcf_dm,68cc#########################5222,e62c] [c.v.v.c.service.ResourceCacheService,dm-exec-2]  Parent domain 5#####4-####-####-####-90e#######66 of vcenter {} doesn't exist in the inventory cache. Skip adding this vcenter to the cache.
    DEBUG [vcf_dm,0000000000000000,0000] [c.v.v.c.service.ResourceCacheService,ForkJoinPool.commonPool-worker-3433]  Cluster 8######e-####-####-####-d############1b4 is in ERROR status and it's not ACTIVE. Skip adding this resource to the inventory.

Environment

VMware Cloud Foundation 5.x

Cause

SDDC Manager fails to load the workload domain into the cache because it sees it in an ERROR state. That cascades to skipping vCenter, cluster, and the Hosts.

Resolution

  1. Take snapshot of SDDC Manager VM
  2. Download VCF Diagnostic Tool and copy it to SDDC Manager VM - Refer Using the VCF Diagnostic Tool for vSphere (VDT)
  3. SSH to SDDC Manager VM with vcf user and elevate to root with su
  4. Run VCF Diagnostic tool on SDDC Manager VM to identify the inventory in ERROR state - Refer Using the VCF Diagnostic Tool for SDDC Manager 

    Sample output:
       INVENTORY STATUS
    
            [FAIL]    Host Status Check
                        b8a8####-####-####-####-########e94b | wldesx01.example.com | ERROR
    
            [FAIL]    Domain Status Check
                        f152####-####-####-####-########1136 | WLD-1 | ERROR
    
            [PASS]    vCenter Status Check
                        All vCenters are in an ACTIVE state.
    
            [PASS]    PSCs Status Check
                        All PSCs are in an ACTIVE state.
    
            [PASS]    Cluster Status Check
                        All Clusters are in an ACTIVE state.
    
            [PASS]    NSX Manager Status Check
                        All NSX Managers are in an ACTIVE state.
    
            [PASS]    NSX Edge Status Check
                        All NSX Edges are in an ACTIVE state.

     

    Note: Domain WLD-1 and Host wldesx01.example.com are in ERROR state

  5. Connect to platform database
    psql -h localhost -U postgres -d platform

     

  6. Confirm the domain status in platform database
    select id,name from domain where status!='ACTIVE';

     

    Sample output

                      id                  |    name     | status
    --------------------------------------+-------------+--------
    f152####-####-####-####-########1136 | SAMPLE-WLD  | ERROR

     

  7. Confirm the host status in platform database
    select id,hostname from host where status!='ACTIVE';

     

    Sample output

                      id                  |       hostname
    --------------------------------------+-----------------------
     b8a8####-####-####-####-########e94b | wldesx01.example.com

     

  8. Update the domain status to ACTIVE
    update domain set status='ACTIVE' where id='<id of domain in ERROR from step#6>';


    Sample

    update domain set status='ACTIVE' where id='f152####-####-####-####-########1136';

     

  9. Update host status to ACTIVE
    update host set status='ACTIVE' where id='<id of host in ERROR from step#7>';


    Sample

    update host set status='ACTIVE' where id='b8a8####-####-####-####-########e94b';

     

  10. Restart SDDC Manager services
    /opt/vmware/vcf/operationsmanager/scripts/cli/sddcmanager_restart_services.sh

     

  11. Re-run the pre-checks on SDDC Manager