Network for a pod could not be completed, with message network interface for it will not be configured
search cancel

Network for a pod could not be completed, with message network interface for it will not be configured

book

Article ID: 397513

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

We are seeing some sandbox creation issues on the foundation that is already upgraded.

 

It seems to affect a specific worker-node, if  recreate is attempted on this worker-node bosh recreate operation get an error on the VM creation as the following example:

Error: Timed out pinging VM 'vm-<ID>' with agent '<AGENT-ID>' after 600 seconds

In summary 

Only specific worker nodes on specific clusters are affected and sometime if the recreation is tried bosh director times out with above error

Environment

TKGi 1.19.x 1.20.x

Cause

Further investigation pointed out that all affected nodes were in once AZ and one specific cluster 

After further analisys we exported all VMs from bosh that were unresponsive and also confirmed that there are some VMs that are unresponsive and are on the same cluster 

Narrowing down to a single host, and after some additional checks we confirmed the issue is related to NSX fabric and may be a faulty process.

Resolution

Placing the ESXi host in maintenance mode and migrating all VMs from it fixed the issue for both problems pod network creation and bosh ping timeout