How to manually collect Cloud Provider Interface logs when a BOSH CPI request is stuck in Operations Manager
search cancel

How to manually collect Cloud Provider Interface logs when a BOSH CPI request is stuck in Operations Manager

book

Article ID: 293744

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

This article covers how to manually collect Cloud Provider Interface logs when a BOSH CPI request is stuck in Operations Manager.

BOSH interacts with IaaS (AWS, Azure, GCP, vSphere, etc) through Cloud Provider Interface (CPI). An invocation of an external CPI request to IaaS triggers a separate process in the BOSH director VM. BOSH writes CPI logs into a BOSH task log only after the CPI process exits. When the CPI request is stuck for any reason and the CPI process never exits, the BOSH task log provides no visibility on the progress of the external CPI request.

The following is an example of a successful CPI request. The CPI request is identified by a six-digit number, 832905. The log line of the CPI response is followed by the detailed CPI logs.

D, [2021-07-13T11:28:39.385610 #14747] [create_missing_vm(worker/47990939-b72c-400b-bd40-092691f23987 (0)/2)] DEBUG -- DirectorJobRunner: [external-cpi] [cpi-832905] request: {"method":"info","arguments":[],"context":{"director_uuid":"016c5dd6-2c7d-4115-b909-4c566b34c668","request_id":"cpi-832905","vm":{"stemcell":{"api_version":3}},"datacenters":"<redacted>","default_disk_type":"<redacted>","host":"<redacted>","nsxt":"<redacted>","password":"<redacted>","user":"<redacted>"}} with command: /var/vcap/jobs/vsphere_cpi/bin/cpi

D, [2021-07-13T11:28:47.973755 #14747] [create_missing_vm(worker/47990939-b72c-400b-bd40-092691f23987 (0)/2)] DEBUG -- DirectorJobRunner: [external-cpi] [cpi-832905] response: {"result":{"stemcell_formats":["vsphere-ovf","vsphere-ova"]},"error":null,"log":""}, err: 

I, [2021-07-13T11:28:40.280376 #15324]  INFO -- [req_id cpi-832905]: Starting info...
I, [2021-07-13T11:28:40.280498 #15324]  INFO -- : Fetching vSphere version to locate SDK
/var/vcap/data/packages/vsphere_cpi/2165fa9d3bfa5cbb671373aa8bc784864aba7c02/lib/ruby_vim_sdk/vmomi_support.rb:56: warning: constant ::Fixnum is deprecated
I, [2021-07-13T11:28:42.652554 #15324]  INFO -- : Loaded vSphere 6.5 SDK
/var/vcap/data/packages/vsphere_cpi/2165fa9d3bfa5cbb671373aa8bc784864aba7c02/lib/nsxt/nsxt_client/api/upgrade_node_upgrade_api.rb:145: warning: character class has '-' without escape: /^[a-zA-Z0-9-.]+$/
D, [2021-07-13T11:28:47.673061 #15324] DEBUG -- [req_id cpi-832905]: Running method 'RetrieveServiceContent'...
D, [2021-07-13T11:28:47.711672 #15324] DEBUG -- [req_id cpi-832905]: Running method 'Login'...
I, [2021-07-13T11:28:47.934969 #15324]  INFO -- [req_id cpi-832905]: Finished info in 7.65 seconds
D, [2021-07-13T11:28:47.935329 #15324] DEBUG -- [req_id cpi-832905]: Running method 'Logout'...
, exit_status: pid 15324 exit 0


To verify if your BOSH task is stuck on a CPI request, perform the following steps.
 
1. ApplyChange has been hung for hours without progress. Identify the BOSH task ID from ApplyChange trace and retrieve the BOSH task debug log.

# verify the task is still "processing"
bosh tasks

# obtain task debug log. The command will not end because the task is still running. So let the command stream debug logs for a minute and then CTRL-C.

bosh task TASK_ID --debug > task-debug.log


2. Filter out stuck CPI requests that have no response yet.

export TASK_DEBUG_LOG=task-debug.log

# identify stuck CPI request IDs

diff <(cat $TASK_DEBUG_LOG | egrep  -o '\[cpi-[0-9]+\] request:' | egrep -o 'cpi-[0-9]+' | sort | uniq) <(cat $TASK_DEBUG_LOG | egrep  -o '\[cpi-[0-9]+\] response:' | egrep -o 'cpi-[0-9]+' | sort | uniq) | egrep -o 'cpi-[0-9]+' > stuck-cpi-id.out

# print the stuck CPI requests

for stuck_cpi in $(cat stuck-cpi-id.out); do echo "=== stuck CPI request: ${stuck_cpi} ==="; cat $TASK_DEBUG_LOG | egrep "${stuck_cpi}\] request"; echo ""; done


Environment

Product Version: 2.10

Resolution

To obtain CPI logs when a CPI request is stuck, you need to end the CPI process manually.

BOSH Director v271.10.0 or higher contains the following CPI logging improvement:
 
"Introduces changes that make it much easier to diagnose CPI hangs. The Director now spawns CPI children using popen3, which permits the Director to get CPI output as it is created, rather than getting it all when the CPI terminates. (PR #2319)"
 
Ending the stuck CPI process manually is only required for BOSH Director v271.9.0 or lower.

1. SSH into BOSH director VM and end the CPI processes.
# Review running CPI processes. The example result below shows four vsphere CPI processes

ps -efw | grep cpi

root        86     2  0 Aug26 ?        00:00:00 [acpi_thermal_pm]
vcap      2133  1352 18 07:25 ?        00:00:07 /var/vcap/packages/vsphere_cpi/bin/vsphere_cpi /var/vcap/jobs/vsphere_cpi/config/cpi.json
vcap      2134  1352 18 07:25 ?        00:00:06 /var/vcap/packages/vsphere_cpi/bin/vsphere_cpi /var/vcap/jobs/vsphere_cpi/config/cpi.json
vcap      2137  1352 19 07:25 ?        00:00:07 /var/vcap/packages/vsphere_cpi/bin/vsphere_cpi /var/vcap/jobs/vsphere_cpi/config/cpi.json
vcap      2138  1352 19 07:25 ?        00:00:07 /var/vcap/packages/vsphere_cpi/bin/vsphere_cpi /var/vcap/jobs/vsphere_cpi/config/cpi.json
root      2155 31647  0 07:26 pts/0    00:00:00 grep --color=auto cpi

# kill them all

pgrep -f vsphere_cpi
pkill --signal 9 -f vsphere_cpi

2. Collect the BOSH task debug log to review CPI logs.
bosh task TASK_ID --debug > task-debug.log