Pods stuck in "ContainerCreating" status if Pod name contains a dot
search cancel

Pods stuck in "ContainerCreating" status if Pod name contains a dot

book

Article ID: 316964

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • Processing Pod name fails in NSX Container Plug-in (NCP) 3.0.1.
  • When deploying pods with periods "." in their name, the pod doesn't get past the ContainerCreating stage.
  • You see events similar to the following when you run kubectl describe pod <podname>:
kubectl describe po busybox-sleep.1
### lines omitted for brevity ###
Events:
  Type     Reason                  Age                  From                                           Message
  ----     ------                  ----                 ----                                           -------
  Normal   Scheduled               <unknown>            default-scheduler                              Successfully assigned default/busybox-sleep.1 to ########-####-####-####-########57b5
  Warning  FailedCreatePodSandBox  7m6s (x5 over 23m)   kubelet, ########-####-####-####-########57b5  Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   SandboxChanged          3m17s (x6 over 23m)  kubelet, ########-####-####-####-########57b5  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  3m17s                kubelet, ########-####-####-####-########57b5  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "4544fe7a3239227c4b5463a44afc31528840f671c466a4a17237f7917f040366" network for pod "busybox-sleep.1": networkPlugin cni failed to set up pod "busybox-sleep.1_default" network: netplugin failed with no error message

 

  • You see messages similar to the following In the the nsx-node-agent/nsx-node-agent.stdout.log from a worker log bundle (hyperbus created the VIF, but will not see that the CNI created the OVS port. Instead will see that the agent.cache did not get the container's network.):
1 2021-01-27T21:25:05.635Z ########-####-####-####-########57b5 NSX 4968 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.hyperbus_service Updated app nsx.default.busybox-sleep.1 with IP 172.16.3.2/24, MAC ##:##:##:##:##:04, gateway 172.16.3.1/24, vlan 12, CIF ########-####-####-####-########148c
1 2021-01-27T21:25:06.710Z ########-####-####-####-########57b5 NSX 4968 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="WARNING"] nsx_ujo.agent.cache Did not get nsx.default.busybox-sleep.1 networks from cache



Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x
VMware PKS 1.x

Cause

This issue occurs due to the Pod name containing a dot.

Resolution

This issue is resolved in VMware Tanzu Kubernetes Grid Integrated Edition Management Console 1.9 with NSX-T Data Center 3.0.2, available at Broadcom Downloads

Workaround:
To work around this issue if you do not want to upgrade, do not use a dot in the Pod name.