Symptoms:
bosh vms
, you see some TKGI Control Plane or Kubernetes cluster nodes with Process State of unresponsive agent.NOTE: These VMs could be TKGI Control Plane VMs or Kubernetes cluster VMs.
Example: Below uses bosh
CLI to check VM states for a Kubernetes cluster with CLUSTER UUID VVVVVVVV-WWWW-XXXX-YYYY-ZZZZZZZZZZZZ:
$ bosh -d service-instance_VVVVVVVV-WWWW-XXXX-YYYY-ZZZZZZZZZZZZ vms
Instance Process State AZ IPs VM CID VM Type Active Stemcell
master/16637df3-aa5c-49a3-9824-c10c83173908 running AZ1 <IP-REDACTED> vm-0e98b6ae-d00e-4ab0-a3df-487ea8115606 xlarge true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
master/60f9453b-5e59-412e-aa62-143c8d2cac57 running AZ2 <IP-REDACTED> vm-faee689c-2f75-48b7-ba37-d37e014f6745 xlarge true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
master/657f6783-a003-4c90-aa19-78b161299658 running AZ3 <IP-REDACTED> vm-764247f9-0a27-468d-918a-d9c2d31f8888 xlarge true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/094ff4c0-9872-42f1-9e58-dce164323ca3 unresponsive agent AZ2 <IP-REDACTED> vm-3f189327-1f19-4dcf-9361-511b78fa4ce4 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/0ec4ee91-9831-44d4-a09d-52d364bc57ca unresponsive agent AZ3 <IP-REDACTED> vm-bfcc1cc9-b1ef-43f8-9fe9-708dd4c618a1 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/1c2310c3-8f5f-46c5-b7bd-25505fb7d12f unresponsive agent AZ2 <IP-REDACTED> vm-c1ceee48-59db-4b1e-9e5d-62f7e05d4a08 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/2632be2c-a047-4ae0-b11e-652be1665dad unresponsive agent AZ3 <IP-REDACTED> vm-9293a0bd-8ad5-4083-8714-db0be982ef15 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/2a79da92-bf5a-46e6-8cfd-633417613581 unresponsive agent AZ2 <IP-REDACTED> vm-a63350c8-e7a0-4aac-a7b3-4b206fd9b563 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/2b7fde58-fd3b-4dc8-b057-2e1f94798911 unresponsive agent AZ3 <IP-REDACTED> vm-850b8574-98dc-473d-92e7-4f4c0e3ff31f - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/2c76746a-be6b-4a06-8488-43df869e76b0 unresponsive agent AZ3 <IP-REDACTED> vm-65d93f74-aac0-4540-af5b-b5cddef937d4 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/325adb08-bfcc-44f0-b4bd-3810c56d5b4a unresponsive agent AZ2 <IP-REDACTED> vm-fe9c47ee-e2bf-435a-9e92-682650249515 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/3ccad40f-1ba1-466e-af7f-a19d71326628 unresponsive agent AZ2 <IP-REDACTED> vm-4371b645-7d7a-40c0-931f-65f47a19f47d - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/3cfe5cf6-23ee-4f48-80fd-6a0f0cfd3df5 unresponsive agent AZ3 <IP-REDACTED> vm-2e7f124c-485a-4d88-8223-da005dc62fd5 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/50443100-93c6-406f-ab1d-54a1bc739bf2 unresponsive agent AZ3 <IP-REDACTED> vm-a5fd7b92-09c4-4968-9815-96682a791238 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/51267837-bc58-4df4-9e20-eaaa7c864eef unresponsive agent AZ1 <IP-REDACTED> vm-a3948027-a162-497a-932f-9cfd04663649 - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/5b3651d3-3ae3-4d0b-ab84-85e5a9327785 unresponsive agent AZ1 <IP-REDACTED> vm-6a4b343c-c3e9-45c5-8b4a-3fae5d5ed10b - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
worker-wf-4xlarge/619b576f-c372-4cf2-954e-84a6a0280219 unresponsive agent AZ1 <IP-REDACTED> vm-4ea78bf2-fe9c-4600-a7a5-49edeac1fdad - true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.360
NOTE: This checks is the "global" status of the setting. There can be specific resurrector settings at a bosh deployment level
Also, the resurrector status results from the command line are not the same as from the Opsmanager UI.
bosh curl /resurrection
TKGI
This issue occurs when either BOSH Resurrector Plugin (VM Resurrector) is not enabled in the Bosh Director tile or vSphere DRS is not automatic.
When vCenter DRS setting have been changed to disabled / turned off / manual, it effects the Bosh resurrector from working correctly. The Resurrector is unable to delete and terminate the VM's and then have them moved or have them rebuilt correctly.
To resolve this issue you can take several actions. Some of which include:
bosh
CLI. If not, turn it on
bosh curl /resurrection
bosh update-resurrection on
NOTE: This should allow for BOSH and the resurrector to fix the nodes and the bosh agent should respond again.
OR