Symptoms
From /var/run/log/clomd.log we observe clomd is performing concurrent repair tasks to address missing witness components across multiple objects
YYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876 opID=1804373051] CLOMReconfigure: exit: obj 575c4565-####-####-####-#### transiantCapGenerated - total: 0, site1: 0, site2: 0, workItem type REPAIR configDelay 0 newConfigGenerated 1 newCompWitnessOnly 1 status Success
YYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876 opID=1804373052] CLOMReconfigure: exit: obj 84deaf65-####-####-####-#### transiantCapGenerated - total: 0, site1: 0, site2: 0, workItem type REPAIR configDelay 0 newConfigGenerated 1 newCompWitnessOnly 1 status Success
YYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876 opID=1804373054] CLOMReconfigure: exit: obj 4d397a64-####-####-####-#### transiantCapGenerated - total: 0, site1: 0, site2: 0, workItem type REPAIR configDelay 0 newConfigGenerated 1 newCompWitnessOnly 1 status Success
YYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876 opID=1804373055] CLOMReconfigure: exit: obj 26437f64-####-####-####-#### transiantCapGenerated - total: 0, site1: 0, site2: 0, workItem type REPAIR configDelay 0 newConfigGenerated 1 newCompWitnessOnly 1 status Success
YYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876 opID=1804373056] CLOMReconfigure: exit: obj 352c7f64-####-####-####-#### transiantCapGenerated - total: 0, site1: 0, site2: 0, workItem type REPAIR configDelay 0 newConfigGenerated 1 newCompWitnessOnly 1 status Success
From /var/run/log/clomd.log taking one of witness component (identified by UUID 26437f64-####-####-####-#### ), due to RDT session timeouts on the witness node, associated witness components are marked absent. Clom detects the absence and repeatedly initiates repair operations, leading to a recurring repair loop.
YYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876] CLOM_PostWorkItem: Posted a work item opID:1804373061 for 26437f64-####-####-####-#### group: 00000000-0000-0000-0000-000000000000 Type: REPAIR delay 0 (Success)YYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876 opID=1804373061] CLOMProcessWorkItem: Op REPAIR starts:1804373061YYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876 opID=1804373061] CLOMReconfigure: Reconfiguring 26437f64-####-####-####-#### workItem type REPAIRYYYY-MM-DDTHH:MM:SSZ Er(27) clomd[5535814]: [Originator@6876 opID=1804373061] CLOMReplacementPreWorkRepair: Repair needed. 1 absent/degraded data components for 26437f64-####-####-####-#### foundYYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876 opID=1804373061] CLOMReconfigure: exit: obj 26437f64-####-####-####-#### transiantCapGenerated - total: 0, site1: 0, site2: 0, workItem type REPAIR configDelay 0 newConfigGenerated 1 newCompWitnessOnly 1 status SuccessYYYY-MM-DDTHH:MM:SSZ No(29) clomd[5535814]: [Originator@6876 opID=1804373061] CLOM_PublishResyncBytes: No more work for 26437f64-####-####-####-#### (Success), reset queued resync bytes to 0
VMware vSAN 8.x
Witness nodes do not receive the required entries during reconfiguration. Without these entries, the witness cannot properly validate or clean up components, resulting in leaked components.
To remediate this issue, upgrade the ESXi hosts to a version that includes the required bug fix:
vSphere 8.0 Patch 05 (8.0 P05)
vSphere 9.0 GA or later releases
If your environment is not currently running a version that contains this fix, contact Broadcom Support for further assistance.