In SSP side, the pods are in pending status with waiting for PVC creation.
This is a vSphere/ESXi infrastructure issue — not an SSP software defect. However it directly impacts SSP and any Tanzu Kubernetes cluster relying on CNS/VMFS storage as PVC creation fails, causing pod scheduling failures across multiple SSP workloads.
SSP pods affected — stuck in Pending state:
<none> <none>
nsxi-platform overflowcorrelator-8ec88e9c96637f93-exec-1 0/1 Pending 0 2d <none> <none> <none> <none>
nsxi-platform overflowcorrelator-8ec88e9c96637f93-exec-2 0/1 Pending 0 2d <none> <none> <none> <none>
nsxi-platform overflowcorrelator-8ec88e9c96637f93-exec-3 0/1 Pending 0 2d <none> <none> <none> <none>
nsxi-platform rawflowcorrelator-f2b8549c966379f5-exec-1 0/1 Pending 0 2d <none> <none> <none> <none>
nsxi-platform rawflowcorrelator-f2b8549c966379f5-exec-2 0/1 Pending 0 2d <none> <none> <none> <none>
nsxi-platform rawflowcorrelator-f2b8549c966379f5-exec-3 0/1 Pending 0 2d <none> <none> <none> <none>
Pod scheduling error:
Warning FailedScheduling: 0/8 nodes are available: pod has unbound immediate PersistentVolumeClaims
ESXi hostd log — VSLM disk creation failure:
Vslm Failure: VslmCreateDisk failed for fcd on datastore /vmfs/volumes/<datastore-id>/ with type vim.fault.DatabaseError Fault cause: vim.fault.DatabaseError
ESXi vmkernel log — ATS lock failure:
DLX: vol 'GLC-xxxxx-<id>', lock at <offset>: Lock type: 10C00001. [Req mode 1] try lock error: Atomic test and set of disk block returned false for equality
CSI controller log — CNS volume creation failure:
failed to create disk <pvc-name> with error: failed to create volume with fault: CnsFault error: VSLM task failed
The core issue is a VMFS vclock corruption on the vDefend datastore that caused the vclock file to stop advancing its tick counter. The vclock mechanism uses a file rename operation to atomically increment its counter. When that rename operation failed at the VMFS layer, the FCD catalog database became unable to register new disk operations.
This caused every new createDisk call from the CSI driver to fail with vim.fault.DatabaseError — Database temporarily unavailable or has network problems. Existing PVCs continued working normally as they do not require new catalog entries — only new PVC provisioning was blocked.
The vclock file exhibited a ghost file behavior where it appeared in directory listings but all file operations (rm, mv) failed with No such file or directory, indicating VMFS-level corruption of the catalog entry.
This issue requires collaboration across multiple teams. Do not attempt the vclock remediation steps without involving the appropriate teams:
Engage Broadcom Support by opening a support request and providing ESXi support bundles, hostd logs, vmkernel logs, and catalog logs covering the period when the issue started. Do not perform the vclock recreation steps in a production environment without Broadcom Support guidance.