BOSH Deployment Fails with "Unresponsive Agent" Error due to Unresponsive VMs
search cancel

BOSH Deployment Fails with "Unresponsive Agent" Error due to Unresponsive VMs

book

Article ID: 293515

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

Symptoms:

BOSH deployment fails and throws an error which mentions "unresponsive agent". VM does not respond to "ping" test. The error is seen in the output of the bosh command. 


Error Message:

diego_cell/947e9dcc-e309-4e2a-b575-e39f427e0225unresponsive agentPCF-PEZ-Heritage-RP03  
192.168.8.232   vm-28d4740e-7f5a-4c2e-b009-1445a56351ee  xlarge.disk


Ping test:

PING 192.168.8.232 (192.168.8.232) 56(84) bytes of data.

--- 192.168.8.232 ping statistics ---

6 packets transmitted, 0 received, 100% packet loss, time 5040ms

Environment


Cause

VM becomes unresponsive, leading to BOSH deployment failure.

There are many possible causes:

  • The Diego subsystem could be hung 
  • The Windows BOSH services may have been turned off or never started 
  • The VM itself may have been terminated, halted, or deleted

Resolution

  1. Linux: vMotion the unresponsive VM to a new ESXi host
  2. Windows: see procedure below
  3. General: The VM may have been halted, terminated, or deleted. In such case, it should be recreated or resurrected by Bosh or PCF automatically, or it can be created by "bosh cck" (cloud check).

Windows Workaround

Step 1. Get VM ID’s from BOSH

ubuntu@heritage-ops-manager:~$ bosh2 vms -d p-windows-runtime-e5218d09b3122b1e2dae

Using environment '192.168.8.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Task 178070. Done

Deployment 'p-windows-runtime-e5218d09b3122b1e2dae'

Instance                                                 Process State       AZ        IPs            VM CID                                   VM Type

windows_diego_cell/2d5e80ea-803f-488a-8efd-3405ee739a9a  failing             PCF-RP04  192.168.8.121  vm-0f66960b-0173-4d90-99c7-1c1a2ba1b3bb  xlarge.disk

windows_diego_cell/33c31d65-1456-452d-8423-fd38051a0651  failing             PCF-RP02  192.168.8.83   vm-9ce6c222-82d6-4bfc-9b0c-a9e016182ee6  xlarge.disk

windows_diego_cell/607eb823-52f5-412b-8e10-d0f99753ea51  running             PCF-RP04  192.168.8.146  vm-ef60d65b-0f87-4d75-aa84-eea613759a5c  xlarge.disk

windows_diego_cell/7b083c21-5c70-43a1-aaa5-f51b0aee1728  failing             PCF-RP04  192.168.8.151  vm-50adbef9-ea1c-477d-a934-5537b8a68793  xlarge.disk

windows_diego_cell/d4c8eaf7-a0a0-4c58-9b0d-a7f98bb2e2f9  unresponsive agent  PCF-RP01  192.168.8.80   vm-f99d0193-9adb-4706-a9f9-3388b7e611e4  xlarge.disk

windows_diego_cell/ecbf57fc-1eef-4e98-876c-bbe2d8fd19ad  unresponsive agent  PCF-RP03  192.168.8.106  vm-8df9bc88-d2e1-4ecc-82f3-269c4f01c359  xlarge.disk

windows_diego_cell/f8b4cc6a-00a1-4072-86b6-b1cbaaf74616  unresponsive agent  PCF-RP03  192.168.8.108  vm-1bad1e18-6b85-41f9-8a8d-7c86e32aceca  xlarge.disk

windows_diego_cell/fdc80ed0-79fc-4436-8846-da84d22f9bf0  unresponsive agent  PCF-RP03  192.168.8.119  vm-b8bb84bb-1295-40ab-b7d7-748b1bf1cae4  xlarge.disk


8 vms
 
Succeeded

Step 2: Console into your VM through you IaaS

Login into the Windows VM with your credentials used to create the Stemcell.

Step 3. Go to the "Start" menu and then to the Task Manager

Step 4. On the "Services" tab, sort by Name: BOSH Agent. Note it is stopped.

Step 5. Right click on BOSH Agent, select "Open Services".

Step 6. On the Services window, you can see that the BOSH Agent is set to Disabled by default.

Step 7. Right-click on the BOSH Agent, and click "Properties".

Step 8. Change the startup type to "Manual', click "Apply". Windows updates the BOSH Agent.

Step 9. Click "Start". The BOSH Agent starts.