Machines do not join cluster after upgrading to VKS 3.5 when there are 7 or more extra volumes
search cancel

Machines do not join cluster after upgrading to VKS 3.5 when there are 7 or more extra volumes

book

Article ID: 441193

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

After upgrading to vSphere Kubernetes Service 3.5 or later, and performing any operation to rollout a node (e.g. scale out or upgrade), new machines fail to join the cluster and stay stuck in Provisioned state.

Logging into the machine and reviewing /var/log/cloud-init-output.log, will show



[2026-05-20 06:37:00] {"time":"2026-05-20T06:37:00.941482319Z","level":"ERROR","msg":"error applying: error applying task mount-<path>: error mounting disk to [/var/lib/kubelet]: error getting block d
evices for [/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:7:0]: error obtaining block device info for [/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:7:0]: exit status 32"}
[2026-05-20 06:37:00] {"time":"2026-05-20T06:37:00.94150328Z","level":"DEBUG","msg":"error applying: error applying task mount-var-lib-kubelet: error mounting disk to [/var/lib/kubelet]: error getting block devices for [/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:7:0]: error obtaining block device info for [/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:7:0]: exit status 32","StackTrace":"goroutine 1 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:26 +0x5e\ngithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/pkg/stack.Push(...)\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/pkg/stack/stack.go:14\ngithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/pkg/graph.(*TaskSet).Apply(0xc000010bb8)\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/pkg/graph/graph.go:241 +0x459\ngithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/cmd/apply.(*applyCmd).run(0xc0005422e8, 0xc000147af8?, {0x0?, 0x0?, 0x0?})\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/cmd/apply/apply.go:118 +0x495\ngithub.com/spf13/cobra.(*Command).execute(0xc000030c08, {0xc000544680, 0x4, 0x4})\n\tgithub.com/spf13/[email protected]/command.go:1015 +0xaaa\ngithub.com/spf13/cobra.(*Command).ExecuteC(0xc000030008)\n\tgithub.com/spf13/[email protected]/command.go:1148 +0x46f\ngithub.com/spf13/cobra.(*Command).Execute(...)\n\tgithub.com/spf13/[email protected]/command.go:1071\ngithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/cmd.Execute()\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/cmd/root.go:102 +0x25\nmain.main()\n\tgithub-vcf.devops.broadcom.net/vcf/kubernetes-service/services/vks-cluster-configuration/machine-agent/main.go:10 +0xf\n"}


Workaround

Edit cluster definition and reduce the extravolumes for the cluster, controlPlanes or node pools to below the 7 threshold. Refer to https://developer.broadcom.com/xapis/vmware-vsphere-kubernetes-service/latest/variable-docs.html#persistentvolumes for reference.

 

 

 

 

 

Environment

  • VKS 3.5+

Cause

The VKS Machine Agent uses a counter to match desired volumes to their SCSI path. However, ESXi does not attach disks to slot 7 as this is traditionally reserved for the SCSI Initiator.

In previous releases, this silently failed, with the result that the 7th entry in extraVolumes was never mounted. In VKS 3.5, more robust error checking was added that causes the bug to be surfaced.

Resolution

  • This will be fixed in a future release of VKS