vCLS machine stuck in a Power-Off or Recreation Loop on specific cluster
search cancel

vCLS machine stuck in a Power-Off or Recreation Loop on specific cluster

book

Article ID: 426263

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

  • vSphere Cluster Services (vCLS) machines fail to power off while placing a host in Maintenance Mode and requires manual power-off
  • /var/log/infravisor.log on ESXi host reports below error:

YYYY-MM-DDTHH:MM:SS No(5) infravisor[2100217]: time="YYYY-MM-DDTHH:MM:SS" level=info msg="Deleting Event: &Event{ObjectMeta:{vcls-<>.188a648880e6fec8  vcls    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:vcls,Name:vcls-<>,UID:<>,APIVersion:,ResourceVersion:,FieldPath:,},Reason:PodFailed,Message:Pod vcls-<> has failed,Source:EventSource{Component:kubelet,Host:<ESXi host FQDN>,},FirstTimestamp:YYYY-MM-DDTHH:MM:SS +0000 UTC,LastTimestamp:YYYY-MM-DDTHH:MM:SS +0000 UTC,Count:8,Type:Warning,EventTime:YYYY-MM-DDTHH:MM:SS +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}"

Cause

This issue is caused due to thread handling PodCrx request is hung in vpxd process.

  • /var/log/vmware/vpxd/vpxd-profiler.log on vCenter Server reports a HdcsPodCrxManager in hung state from a date in past

    YY-MM-DD-THH:MM:SS info vpxd[PID] [Originator@6876 sub=App]
    --> <pullCounters>
    ...
    --> <threadStates>
    --> ThreadState/ThreadId/<ID>/State/Task::lro-<>::::HdcsPodCrxManager::

Resolution

This is a known issue with vCenter Server and currently there is no resolution.

Workaround

Log in to vCenter Server Management Interface (VAMI) with root account and restart VMware vCenter Server service. Refer to Stop, Start or Restart Services on vCenter Server 7.x/8.x

Note: Reach out to Broadcom Technical Support in case of additional analysis required. Collect vpxd livecore dump prior to service restart. Refer to Generating a vpxd live core dump