After upgrading to vSphere Kubernetes Service 3.5 or later, and performing any operation to rollout a node (e.g. scale out or upgrade), new machines fail to join the cluster and stay stuck in Provisioned state.
Logging into the machine and reviewing /var/log/cloud-init-output.log, will show
[2026-05-20 06:37:00] {"time":"2026-05-20T06:37:00.941482319Z","level":"ERROR","msg":"error applying: error applying task mount-<path>: error mounting disk to [/var/lib/kubelet]: error getting block d
evices for [/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:7:0]: error obtaining block device info for [/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:7:0]: exit status 32"}
[2026-05-20 06:37:00] {"time":"2026-05-20T06:37:00.94150328Z","level":"DEBUG","msg":"error applying: error applying task mount-var-lib-kubelet: error mounting disk to [/var/lib/kubelet]: error getting block devices for [/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:7:0]: error obtaining block device info for [/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:7:0]: exit status 32","StackTrace":"goroutine 1 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:26 +0x5e\ngithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/pkg/stack.Push(...)\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/pkg/stack/stack.go:14\ngithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/pkg/graph.(*TaskSet).Apply(0xc000010bb8)\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/pkg/graph/graph.go:241 +0x459\ngithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/cmd/apply.(*applyCmd).run(0xc0005422e8, 0xc000147af8?, {0x0?, 0x0?, 0x0?})\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/cmd/apply/apply.go:118 +0x495\ngithub.com/spf13/cobra.(*Command).execute(0xc000030c08, {0xc000544680, 0x4, 0x4})\n\tgithub.com/spf13/[email protected]/command.go:1015 +0xaaa\ngithub.com/spf13/cobra.(*Command).ExecuteC(0xc000030008)\n\tgithub.com/spf13/[email protected]/command.go:1148 +0x46f\ngithub.com/spf13/cobra.(*Command).Execute(...)\n\tgithub.com/spf13/[email protected]/command.go:1071\ngithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/cmd.Execute()\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/cmd/root.go:102 +0x25\nmain.main()\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/main.go:10 +0xf\n"}
Edit cluster definition and reduce the extravolumes for the cluster, controlPlanes or node pools to below the 7 threshold. Refer to https://developer.broadcom.com/xapis/vmware-vsphere-kubernetes-service/latest/variable-docs.html#persistentvolumes for reference.
The VKS Machine Agent uses a counter to match desired volumes to their SCSI path. However, ESXi does not attach disks to slot 7 as this is traditionally reserved for the SCSI Initiator.
In previous releases, this silently failed, with the result that the 7th entry in extraVolumes was never mounted. In VKS 3.5, more robust error checking was added that causes the bug to be surfaced.