Pods creation in TKGi cluster is failing with error:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "a4175113f32a5c4733ac87676f0bedc878a0333ac8c83b329af39c9ae8b91a50" network for pod "example-deployment-rijen-9c8fbb69c-tmrr5": networkPlugin cni failed to set up pod "example-deployment-rijen-9c8fbb69c-tmrr5_default" network: netplugin failed with no error message
In the /var/vcap/sys/log/ncp/ncp.stdout.log file, you see entries similar to:
No logical port found for node 4e17923a-ccb3-4fb3-b893-fbc841e472ca and Failed to get node vif or TN ID for node 4e17923a-ccb3-4fb3-b893-fbc841e472ca in cluster pks-2a19cbb1-fdcd-4d7c-a006-859c05ec5b90
When you run the below command, you see multiple worker nodes Hyperbus status shows Unhealthy:
bosh -d service-instance_<cluster_UUID> ssh worker -c "sudo /var/vcap/jobs/nsx-node-agent/bin/nsxcli -c get node-agent-hyperbus status" | grep -i unhealthy
When you check the logical ports for problematic nodes in the NSX-T Manager, you do not see bosh tag. (scope: bosh/id)
To resolve this issue, manually add the bosh ID tag on the worker node logical port.
Navigate to Logical Switch Ports tab on nsxmanager UI, and search for the vm CID of the problematic node.
Manually add the tag with the key as “bosh/id” and value collected from below steps.
use bosh vms to get the original bosh id, the UUID behind worker/ should be the bosh id
echo -n "${bosh_id}" | shasum -a 1 can get the sha value of bosh id, which is what we need for the tag value
Alternatively, you can try recreating the node which will create a new vm connected to new logical port with all the tags.