On a vSAN cluster, vsan-health related information show blank on vCenter server UI.
vmware-vsan-health service show in stopped status on vCenter server.
When we start the vmware-vsan-health service, the service would start successfully. However, the service would stop after few minutes.
VMware vSAN 8.x
The vsan-health service is crashing due to the underlying issue at the CNS layer.
2025-03-07T04:32:13.637Z In(05) host-2547 <vsan-health> Re-check service health since it is still initializing.2025-03-07T04:32:17.639Z In(05) host-2547 <vsan-health> Running the API Health command as user vsan-health2025-03-07T04:32:17.639Z In(05) host-2547 <vsan-health-healthcmd> Constructed command: /usr/bin/python /usr/lib/vmware-vpx/vsan-health/vsanhealth-vmon-apihealth.py2025-03-07T04:32:18.022Z In(05) host-2547 <vsan-health> Service STARTED successfully.2025-03-07T04:32:18.023Z Wa(03) host-2547 [ReadSvcSubStartupData] No startup information from vsan-health.
2025-03-07T04:33:17.521Z In(05) host-2547 Client info Uid=0,Gid=0,Pid=582836,Comm=(vmon-coredumper),PPid=2,Comm=(kthreadd),PPid=02025-03-07T04:33:17.521Z In(05) host-2547 <vsan-health> Service is dumping core. Coredump count 43. CurrReq: 02025-03-07T04:33:17.521Z In(05) host-2547 <event-pub> Constructed command: /usr/bin/python /usr/lib/vmware-vmon/vmonEventPublisher.py --eventdata vsan-health,UNHEALTHY,HEALTHY,12025-03-07T04:33:18.192Z Wa(03) host-2547 <vsan-health> Service exited. Exit code 12025-03-07T04:33:18.192Z Wa(03) host-2547 <vsan-health> Service exited unexpectedly. Crash count 44. Taking configured recovery action.
vsanvcmgmtd-worker core dumps as below.core.vsanvcmgmtd-wor.524575core.vsanvcmgmtd-wor.528472core.vsanvcmgmtd-wor.579907
/var/log/vmware/vmware-vsan-health-service.log on vCenter server, you may see similar events as below.2025-03-07T04:32:17.390Z info vsanvcmgmtd[580209] [vSAN@6876 sub=PHM::PhmInventoryListener opId=vsan-wfu-2b03] ProcessUpdate called2025-03-07T04:32:17.390Z info vsanvcmgmtd[580209] [vSAN@6876 sub=PHM::PhmInventoryListener opId=vsan-wfu-2b03] ProcessUpdate: Update kind: 'enter' or 'leave'. Ignoring the update2025-03-07T04:32:17.390Z info vsanvcmgmtd[580113] [vSAN@6876 sub=CnsDb] Loaded 34195 volumes out of 34195 volumes from DB.2025-03-07T04:32:17.391Z info vsanvcmgmtd[580113] [vSAN@6876 sub=pcs[0]] Registered listener '[CnsDatastoreListener:0x000055a5374f3d60]'2025-03-07T04:32:17.391Z info vsanvcmgmtd[580211] [vSAN@6876 sub=CnsTask] Fail Cns InProgress Tasks2025-03-07T04:32:17.391Z info vsanvcmgmtd[580211] [vSAN@6876 sub=PropertyCollectorService] CNS: Gathering CNS Tasks2025-03-07T04:32:17.394Z info vsanvcmgmtd[580217] [vSAN@6876 sub=pcs[0]] Started listerner '[CnsDatastoreListener:0x000055a5374f3d60]'2025-03-07T04:32:17.394Z info vsanvcmgmtd[580211] [vSAN@6876 sub=vmomi.soapStub[5]] SOAP request returned HTTP failure; <<io_obj p:0x00007fa3a40dbb40, h:27, <TCP '127.0.0.1 : 37400'>, <TCP '127.0.0.1 : 1080'>>, /sdk>, method: GetRecentTask; code: 500(Internal Server Error); fault: (vim.fault.NotAuthenticated) {--> faultCause = (vmodl.MethodFault) null,--> faultMessage = <unset>,--> object = 'vim.TaskManager:8cbe3917-25c7-4cdc-a28f-53ece89a068e:TaskManager',--> privilegeId = "",--> missingPrivileges = <unset>--> msg = "Received SOAP response fault from [<<io_obj p:0x00007fa3a40dbb40, h:27, <TCP '127.0.0.1 : 37400'>, <TCP '127.0.0.1 : 1080'>>, /sdk>]: GetRecentTask--> The session is not authenticated."--> }2025-03-07T04:32:17.410Z info vsanvcmgmtd[580221] [vSAN@6876 sub=CnsCatalogSvc opId=vsan-wfu-2b03] CNS: CatalogService is initialized successfully2025-03-07T04:32:17.410Z info vsanvcmgmtd[580211] [vSAN@6876 sub=VpxdCnx] Login to the destination, SessionKey: 5237f67a-ae2b-fdd1-51c8-463a17b87ff52025-03-07T04:32:17.410Z info vsanvcmgmtd[580211] [vSAN@6876 sub=VpxdCnx] Recovered session, sid: 1, recoverRequestOnly:false2025-03-07T04:32:17.416Z info vsanvcmgmtd[580113] [vSAN@6876 sub=pcs[0]] Registered listener '[CnsHostListener:0x000055a537768580]'2025-03-07T04:32:17.420Z info vsanvcmgmtd[580244] [vSAN@6876 sub=CnsCatalogSvc] Find file service cluster vim.ClusterComputeResource:domain-c138 for datastore ds:///vmfs/volumes/vsan:52532c9ec0986ec0-af########/2025-03-07T04:32:17.421Z info vsanvcmgmtd[580211] [vSAN@6876 sub=PropertyCollectorService] CNS: Finish gathering CNS Tasks. Total=16, CNS=12025-03-07T04:32:17.421Z info vsanvcmgmtd[580211] [vSAN@6876 sub=CnsTask] Total 1 old CNS tasks are found2025-03-07T04:32:17.425Z info vsanvcmgmtd[580244] [vSAN@6876 sub=PyCppVmomi] Initialized python thread state 0x00007fa3943c3290.2025-03-07T04:32:17.428Z info vsanvcmgmtd[580222] [vSAN@6876 sub=pcs[0]] Started listerner '[CnsHostListener:0x000055a537768580]'2025-03-07T04:32:17.430Z info vsanvcmgmtd[580211] [vSAN@6876 sub=CnsTask] Old task=(vim.TaskInfo) {--> key = "task-47743075",--> task = 'vim.Task:8cbe3917-25c7-4cdc-a28f-53ece89####:task-47743075',--> descriptionId = "com.vmware.cns.tasks.updatevolume",--> entity = 'vim.Folder:8cbe3917-25c7-4cdc-a28f-####:group-d1',--> entityName = "Datacenters",--> state = "running",--> cancelled = false,--> cancelable = false,--> error = (vmodl.fault.SystemError) {--> reason = "Failing pending CNS tasks during startup",--> msg = "",--> },--> progress = 0,--> reason = (vim.TaskReasonUser) {--> userName = "com.vmware.cns"--> },--> queueTime = "2025-03-07T03:35:49.30836Z",--> startTime = "2025-03-07T03:35:49.31687Z",--> eventChainId = 74107978,--> activationId = "3ae1c86d",--> }2025-03-07T04:32:17.430Z info vsanvcmgmtd[580211] [vSAN@6876 sub=CnsTask] Finish Failing Cns InProgress Tasks2025-03-07T04:32:17.448Z info vsanvcmgmtd[580113] [vSAN@6876 sub=CnsSync] PeriodicSyncManager started2025-03-07T04:32:17.448Z info vsanvcmgmtd[580113] [vSAN@6876 sub=CnsSync] Starting sync ...2025-03-07T04:32:17.448Z info vsanvcmgmtd[580113] [vSAN@6876 sub=CnsSync] Sync all datastores ...2025-03-07T04:32:17.448Z info vsanvcmgmtd[580113] [vSAN@6876 sub=CnsSync] Sync ds:///vmfs/volumes/65f710b0-84b01022-4c93-###########/: startVClock = 0, fullSync = true
If the symptoms matches, please contact Broadcom Support for further assistance.