Creating a container fails in Tanzu Application Service or Tanzu Kubernetes Grid Integrated Edition environment with NSX-T when "bosh/id" tags are missing

Products

VMware Tanzu Application Service Tanzu Kubernetes Grid

Issue/Introduction

Symptoms:
In a Tanzu Application Service (TAS) or Tanzu Kubernetes Grid Integrated Edition (TKGI) environment with NSX-T being used for container networking, the logical switch port of each BOSH deployed VM should have a tag with scope "bosh/id".

If the tag is not present on a TAS Diego cell or on a worker node of the TKGI kubernetes cluster, it would fail to schedule a new containers for them.

Symptom 1: TAS + NSX-T

You see this error when deploying a new application.

$ cf push
...
Cell b5465126-199e-4e3d-a297-503ffbf88464 creating container for instance c4338144-833b-4f9f-8c2a-8a133c1ff3c1
Cell b5465126-199e-4e3d-a297-503ffbf88464 failed to create container for instance c4338144-833b-4f9f-8c2a-8a133c1ff3c1: external networker up: exit status 1
...

In ncp.stdout.log, you see this error.

1 2021-05-28T02:35:57.685Z 41290dd2-db92-4fed-8121-4931d59785fc NSX 31669 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="ERROR" errorCode="NCP00010"] nsx_ujo.ncp.nsx.manager.node_service Failed to get node vif or TN ID for node b5465126-199e-4e3d-a297-503ffbf88464 in cluster pas-02

Symptom 2: TKGI + NSX-T

Creating a new pod is stuck with the ContainerCreating status.

$ kubectl get pods
NAME                       READY   STATUS              RESTARTS   AGE
busybox-6f8f48d8d5-kstn4   0/1     ContainerCreating   0          3m55s

$ kubectl describe pod busybox-6f8f48d8d5-kstn4
...
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               31m                default-scheduler  Successfully assigned default/busybox-6f8f48d8d5-kstn4 to 242e8238-6ab4-45d0-8f12-07678398d388
  Warning  FailedCreatePodSandBox  27m                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5f2438d9e80e5932c92e6a5d73998c29dffac3105888b52e84dbeee0831066a0" network for pod "busybox-6f8f48d8d5-kstn4": networkPlugin cni failed to set up pod "busybox-6f8f48d8d5-kstn4_default" network: netplugin failed: "1 2021-07-06T06:20:43.740Z 242e8238-6ab4-45d0-8f12-07678398d388 NSX 28366 - [nsx@6876 comp=\"nsx-container-node\" subcomp=\"nsx_cni\" level=\"INFO\"] __main__ nsx_cni plugin invoked with arguments: ADD\n1 2021-07-06T06:20:43.741Z 242e8238-6ab4-45d0-8f12-07678398d388 NSX 28366 - [nsx@6876 comp=\"nsx-container-node\" subcomp=\"nsx_cni\" level=\"INFO\"] __main__ Reading configuration on standard input\n1 2021-07-06T06:20:43.741Z 242e8238-6ab4-45d0-8f12-07678398d388 NSX 28366 - [nsx@6876 comp=\"nsx-container-node\" subcomp=\"nsx_cni\" level=\"INFO\"] __main__ Configuring networking for container 5f2438d9e80e5932c92e6a5d73998c29dffac3105888b52e84dbeee0831066a0\n1 2021-07-06T06:20:43.741Z 242e8238-6ab4-45d0-8f12-07678398d388 NSX 28366 - [nsx@6876 comp=\"nsx-container-node\" subcomp=\"nsx_cni\" level=\"DEBUG\"] __main__ Network config from input: {u'cniVersion': u'0.3.1', u'runtimeConfig': {u'portMappings': []}, u'name': u'nsx-cni', u'args': {u'cniSocket': u'/var/vcap/sys/run/nsx-node-agent/cni.sock'}, u'capabilities': {u'portMappings': True}, u'type': u'nsx'}\n"
...

In the ncp.stdout.log, you see this error.

2021-07-06T07:10:14.116Z 75aa2e96-6515-4103-a56b-8933d24f447a NSX 11076 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="ERROR" errorCode="NCP00010"] nsx_ujo.ncp.nsx.manager.node_service Failed to get node vif or TN ID for node 242e8238-6ab4-45d0-8f12-07678398d388 in cluster pks-0dea86f7-0f1a-43e3-87ed-fd178c5c3636

The common error "Failed to get node vif or TN ID for node ..." means the NCP job fails to find the logical switch port of the computing node. In other words, a Diego cell in TAS or a worker in the TKGI kubernetes cluster. As a result, NCP fails to set up the network (container IP) for the containers scheduled to the affected computing nodes.

To verify the error in the NCP job log, you need to first identify the active NCP job from multiple Diego database instances in the TAS deployment or from multiple master instances in a TKGI kubernetes cluster.

# identify active ncp job from TAS diego databases
$ bosh -d cf-5271d4c2c7b6f10846fd ssh diego_database "sudo /var/vcap/jobs/ncp/bin/nsxcli -c get ncp-master status" | grep "This instance is the NCP master"
diego_database/eefef677-96c6-4f8a-aef7-32aaabb8b9ad: stdout | This instance is the NCP master

# identify active ncp job from masters in a TKGI k8s cluster
$ bosh -d service-instance_722ab6c4-7ea6-4ab4-93c1-c64362d07d73 ssh master "sudo /var/vcap/jobs/ncp/bin/nsxcli -c get ncp-master status" | grep "This instance is the NCP master"
master/50e2ce22-acb5-4cd7-bbf2-e814d4ab81f8: stdout | This instance is the NCP master

Cause

In NSX-T environment, the logical switch port of each BOSH deployed VM should have a tag with scope "bosh/id". Its value is derived from BOSH VM GUID.

$ bosh -d service-instance_0dea86f7-0f1a-43e3-87ed-fd178c5c3636 vms --column "Instance" --column "VM CID"

Deployment 'service-instance_0dea86f7-0f1a-43e3-87ed-fd178c5c3636'

Instance                                     VM CID
master/fecde9b5-d949-4965-be51-1589e93f4bce  vm-2b84fec5-218c-4a1d-b40f-e0050a497368
worker/5fe5876f-d17c-4f00-9020-4482b4c3c3fe  vm-ee81a0fc-de0a-44b5-93bc-0a1316780805

$ curl -s -k -u "$NSX_USER:$NSX_PASSWORD" -H 'content-type: application/json' "https://${NSX_MANAGER}/api/v1/search?query=resource_type:LogicalPort%20AND%20display_name:*vm-ee81a0fc-de0a-44b5-93bc-0a1316780805*" | jq '.results[] | .tags'
[
  {
    "scope": "bosh/id",
    "tag": "93f5c888c9b2643d2412b892d98c6dfc64dfc90b"
  }
]

$ echo -n "5fe5876f-d17c-4f00-9020-4482b4c3c3fe" | shasum
93f5c888c9b2643d2412b892d98c6dfc64dfc90b  -

Additionally, the logical switch port of each master in the TKGI k8s cluster should have a tag with the scope "pks/k8smastervm". Its value is identical to the k8s cluster GUID.

$ bosh -d service-instance_0dea86f7-0f1a-43e3-87ed-fd178c5c3636 vms --column "Instance" --column "VM CID"

Deployment 'service-instance_0dea86f7-0f1a-43e3-87ed-fd178c5c3636'

Instance                                     VM CID
master/fecde9b5-d949-4965-be51-1589e93f4bce  vm-2b84fec5-218c-4a1d-b40f-e0050a497368
worker/5fe5876f-d17c-4f00-9020-4482b4c3c3fe  vm-ee81a0fc-de0a-44b5-93bc-0a1316780805

$ curl -s -k -u "$NSX_USER:$NSX_PASSWORD" -H 'content-type: application/json' "https://${NSX_MANAGER}/api/v1/search?query=resource_type:LogicalPort%20AND%20display_name:*vm-2b84fec5-218c-4a1d-b40f-e0050a497368*" | jq '.results[] | .tags'
[
  {
    "scope": "pks/k8smastervm",
    "tag": "0dea86f7-0f1a-43e3-87ed-fd178c5c3636"
  },
  {
    "scope": "bosh/id",
    "tag": "4e6b18a578cb1335ef1e6843b28b022213225114"
  }
]

When the tag "bosh/id" is missing on a logical switch port, we would hit the error as described in Symptom 1 because the NCP job relies on this tag to find the logical switch port of a BOSH deployed VM.

When the tag "pks/k8smastervm" is missing on a TKGI master's logical switch port, the NSX-T load balancer sitting in front of the master loses this master as a backend pool member. As a result, the TKGI kubernetes cluster is not accessible through NSX-T load balancer if all masters lose the tag "pks/k8smastervm".

# in example below, 10.123.36.193 is the IP of NSX-T load balancer for TKGI k8s masters

$ kubectl get pods
Unable to connect to the server: dial tcp 10.####.##.###:8443: i/o timeout

To verify if tags are present or not on a VM's logical switch port, please perform the steps as follows.

Identify BOSH VM GUID of the TAS Diego cell or TKGI k8s worker to which the failed container is scheduled.

The cf push trace contains the BOSH VM GUID of the Diego cell. For example, b5465126-199e-4e3d-a297-503ffbf88464 in Symptom 1.

The output of kubectl describe pod POD_NAME contains the kubernetes node name. For example, 242e8238-6ab4-45d0-8f12-07678398d388 in Symptom 2. Use the following commands to get the mapping between the kubernetes node name and BOSH VM GUID.

# list k8s node name, ExternalIP

$ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.addresses[?(@.type=="ExternalIP")].address}{"\n"}{end}'
242e8238-6ab4-45d0-8f12-07678398d388    172.##.###.5
29360604-c8ff-40eb-9371-0f79688f2b11    172.##.###.3
3eda658f-d666-4c60-a7f7-fadf524efc14    172.##.###.4

# get BOSH VM GUID based on k8s node ExternalIP

$ bosh vms --column instance --column ips | grep "172.##.###.5"
worker/5fe5876f-d17c-4f00-9020-4482b4c3c3fe     172.##.###.5

Get VM CID (object reference at IaaS layer) based on BOSH VM GUID.

$ bosh -d service-instance_0dea86f7-0f1a-43e3-87ed-fd178c5c3636 vms --column instance --column "VM CID"

Deployment 'service-instance_0dea86f7-0f1a-43e3-87ed-fd178c5c3636'

Instance                                     VM CID
master/fecde9b5-d949-4965-be51-1589e93f4bce  vm-2b84fec5-218c-4a1d-b40f-e0050a497368
worker/5fe5876f-d17c-4f00-9020-4482b4c3c3fe  vm-ee81a0fc-de0a-44b5-93bc-0a1316780805

Search for logical switch port by using the VM CID in NSX Manager UI or query NSX-T API via curl commands. Below is an example of a missing tag.

$ curl -s -k -u "$NSX_USER:$NSX_PASSWORD" -H 'content-type: application/json' "https://${NSX_MANAGER}/api/v1/search?query=resource_type:LogicalPort%20AND%20display_name:*vm-ee81a0fc-de0a-44b5-93bc-0a1316780805*" | jq '.results[] | .tags'
[]

Resolution

To restore the missing tags, please recreate the VMs through BOSH commands. See the examples below for how to do this per your environment:

Recreate a single Diego cell:

bosh -d cf-5271d4c2c7b6f10846fd recreate diego_cell/9b53bcde-a720-4d72-99c6-701859355514

Recreate all Diego cells in TAS:

bosh -d cf-5271d4c2c7b6f10846fd recreate diego_cell

Recreate all workers in a TKGI kubernetes cluster:

bosh -d service-instance_0dea86f7-0f1a-43e3-87ed-fd178c5c3636 recreate worker/5fe5876f-d17c-4f00-9020-4482b4c3c3fe

Recreate all VMs in a TKGI kubernetes cluster:

bosh -d service-instance_0dea86f7-0f1a-43e3-87ed-fd178c5c3636 recreate

For more information on the bosh recreate commands, refer to Commands - Cloud Foundry.

Note: If the TAS or TKGI environment is large in size, for example, consisting of hundreds of Diego cells, contact Tanzu Support for assistance on implementing less expansive workaround rather than recreating all cells or TKGI kubernetess clusters.

Attachments

Internal_84439_nsx_check_tags_of_bosh_vms get_app