Symptoms:
- During upgrade and possibly other tasks that require bosh director to delete a Tanzu Kubernetes Grid Integrated Edition (TKGI) virtual machine, the task may become hung.
- You will see bosh make a delete VM task to the CPI and then nothing else
- You see messages similar to the following in the bosh task --debug output:
D, [2021-08-04T13:40:58.100149 #24551] [instance_update(master/12345678-abcd-ef12-4321-12345678abcd (2))] DEBUG -- DirectorJobRunner: [external-cpi] [cpi-123456] request: {"method":"delete_vm","arguments":["vm-f343da11-337c-78ab-f123-4567ad2ab12346"],"context":{"director_uuid":"87654321-1234-dcba-4321-1234567890ab","request_id":"cpi-123456","vm":{"stemcell":{"api_version":3}},"datacenters":"<redacted>","default_disk_type":"<redacted>","host":"<redacted>","nsxt":"<redacted>","password":"<redacted>","user":"<redacted>"},"api_version":1} with command: /var/vcap/jobs/vsphere_cpi/bin/cpi
- You see that bosh director is running the /var/vcap/jobs/vsphere_cpi/bin/cpi command. This is the CPI that is on the bosh director VM. This binary should in turn then do what the command was given to it, in this case delete the VM on the cloud provider, vCenter in this example.