During a cluster upgrade, the update process hangs in a "deleting" state. Consequently, pods go down and become unavailable, accompanied by the following event log:
kubectl describe pod <pod-name>
...
...
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreatePodSandBox 85s (x585 over 128m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "71xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx9536": plugin type="antrea" failed (add): rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/run/antrea/cni.sock: connect: no such file or directory"
kube-controller-manager logs: MM DD HH:MM:SS tkc-xxxxx-workers-xxxxx-xxxxx-xxxx5 containerd[621780]: time="YYYY-MM-DDTHH:MM:SS.xxxxxxxxxZ" level=error msg="Failed to destroy network for sandbox \"c1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx9b\"" error="plugin type=\"antrea\" failed (delete): rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /var/run/antrea/cni.sock: connect: no such file or directory\""MM DD HH:MM:SS tkc-xxxxx-workers-xxxxx-xxxxx-xxxx5 containerd[621780]: time="YYYY-MM-DDTHH:MM:SS.xxxxxxxxxZ" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:argocd-application-controller-0,Uid:d5bxxxxx-xxxx-xxxx-xxxx-xxxxxxxx5cfc,Namespace:argocd,Attempt:0,} failed, error" error="rpc error: code = Unknown desc = failed to setup network for sandbox \"c1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx9b\": plugin type=\"antrea\" failed (add): rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /var/run/antrea/cni.sock: connect: no such file or directory\""MM DD HH:MM:SS tkc-xxxxx-workers-xxxxx-xxxxx-xxxx5 containerd[621780]: time="YYYY-MM-DDTHH:MM:SS.xxxxxxxxxZ" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:cert-manager-846c766d98-dzb4v,Uid:acexxxxx-xxxx-xxxx-xxxx-xxxxxxxx53f4,Namespace:cert-manager,Attempt:0,}"MM DD HH:MM:SS tkc-xxxxx-workers-xxxxx-xxxxx-xxxx5 containerd[621780]: time="YYYY-MM-DDTHH:MM:SS.xxxxxxxxxZ" level=error msg="Failed to destroy network for sandbox \"52xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx64\"" error="plugin type=\"antrea\" failed (delete): rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /var/run/antrea/cni.sock: connect: no such file or directory\""MM DD HH:MM:SS tkc-xxxxx-workers-xxxxx-xxxxx-xxxx5 containerd[621780]: time="YYYY-MM-DDTHH:MM:SS.xxxxxxxxxZ" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:cert-manager-846xxxxxxx-xxxxx,Uid:acexxxxx-xxxx-xxxx-xxxx-xxxxxxxx53f4,Namespace:cert-manager,Attempt:0,} failed, error" error="rpc error: code = Unknown desc = failed to setup network for sandbox \"52xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx64\": plugin type=\"
vCenter 8.0
The current CIDR (Class Inter-Domain Routing) defined on the guest cluster has not available IP addresses to continue with the process.
As an example, network with mask /28 has 14 usable IP's available. Keep in mind that apart from nodes, Cluster needs more IPs:
For a 9-node cluster, even a single-node surge could temporarily require additional IPs that exceed the 14 available in a /28 network.
As a recommendation, you can check the usage on NSX-T Manager UI under Networking > IP Address Pools or the specific Segment to see actual number of allocated IPs.
For a 9-node production cluster a /27 (30 usable IPs) is generally recommended to provide enough room for upgrades and infrastructure services.
Redesign the cluster network to accommodate a larger pool of available IP addresses. This is achieved by modifying and expanding the Service and Pod CIDR ranges within the vSphere Kubernetes Guest Cluster.
Refer: Changing Service and Pod CIDR Ranges in vSphere Kubernetes Guest Cluster