Pods stuck in "ContainerCreating" status if Pod name contains a dot

search cancel

Pods stuck in "ContainerCreating" status if Pod name contains a dot

book

Article ID: 316964

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

Processing Pod name fails in NSX Container Plug-in (NCP) 3.0.1.
When deploying pods with periods "." in their name, the pod doesn't get past the ContainerCreating stage.
You see events similar to the following when you run kubectl describe pod <podname>:

kubectl describe po busybox-sleep.1
### lines omitted for brevity ###
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/busybox-sleep.1 to ########-####-####-####-########57b5
Warning FailedCreatePodSandBox 7m6s (x5 over 23m) kubelet, ########-####-####-####-########57b5 Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal SandboxChanged 3m17s (x6 over 23m) kubelet, ########-####-####-####-########57b5 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 3m17s kubelet, ########-####-####-####-########57b5 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "4544fe7a3239227c4b5463a44afc31528840f671c466a4a17237f7917f040366" network for pod "busybox-sleep.1": networkPlugin cni failed to set up pod "busybox-sleep.1_default" network: netplugin failed with no error message

You see messages similar to the following In the the nsx-node-agent/nsx-node-agent.stdout.log from a worker log bundle (hyperbus created the VIF, but will not see that the CNI created the OVS port. Instead will see that the agent.cache did not get the container's network.):

1 2021-01-27T21:25:05.635Z ########-####-####-####-########57b5 NSX 4968 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.hyperbus_service Updated app nsx.default.busybox-sleep.1 with IP 172.16.3.2/24, MAC ##:##:##:##:##:04, gateway 172.16.3.1/24, vlan 12, CIF ########-####-####-####-########148c

…

1 2021-01-27T21:25:06.710Z ########-####-####-####-########57b5 NSX 4968 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="WARNING"] nsx_ujo.agent.cache Did not get nsx.default.busybox-sleep.1 networks from cache

Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x
VMware PKS 1.x

Cause

This issue occurs due to the Pod name containing a dot.

Resolution

This issue is resolved in VMware Tanzu Kubernetes Grid Integrated Edition Management Console 1.9 with NSX-T Data Center 3.0.2, available at Broadcom Downloads

Workaround:
To work around this issue if you do not want to upgrade, do not use a dot in the Pod name.

Feedback

thumb_up Yes

thumb_down No