This article covers how to manually collect Cloud Provider Interface logs when a BOSH CPI request is stuck in Operations Manager.
BOSH interacts with IaaS (AWS, Azure, GCP, vSphere, etc) through Cloud Provider Interface (CPI). An invocation of an external CPI request to IaaS triggers a separate process in the BOSH director VM. BOSH writes CPI logs into a BOSH task log only after the CPI process exits. When the CPI request is stuck for any reason and the CPI process never exits, the BOSH task log provides no visibility on the progress of the external CPI request.
The following is an example of a successful CPI request. The CPI request is identified by a six-digit number, 832905. The log line of the CPI response is followed by the detailed CPI logs.
D, [2021-07-13T11:28:39.385610 #14747] [create_missing_vm(worker/47990939-b72c-400b-bd40-092691f23987 (0)/2)] DEBUG -- DirectorJobRunner: [external-cpi] [cpi-832905] request: {"method":"info","arguments":[],"context":{"director_uuid":"016c5dd6-2c7d-4115-b909-4c566b34c668","request_id":"cpi-832905","vm":{"stemcell":{"api_version":3}},"datacenters":"<redacted>","default_disk_type":"<redacted>","host":"<redacted>","nsxt":"<redacted>","password":"<redacted>","user":"<redacted>"}} with command: /var/vcap/jobs/vsphere_cpi/bin/cpi D, [2021-07-13T11:28:47.973755 #14747] [create_missing_vm(worker/47990939-b72c-400b-bd40-092691f23987 (0)/2)] DEBUG -- DirectorJobRunner: [external-cpi] [cpi-832905] response: {"result":{"stemcell_formats":["vsphere-ovf","vsphere-ova"]},"error":null,"log":""}, err: I, [2021-07-13T11:28:40.280376 #15324] INFO -- [req_id cpi-832905]: Starting info... I, [2021-07-13T11:28:40.280498 #15324] INFO -- : Fetching vSphere version to locate SDK /var/vcap/data/packages/vsphere_cpi/2165fa9d3bfa5cbb671373aa8bc784864aba7c02/lib/ruby_vim_sdk/vmomi_support.rb:56: warning: constant ::Fixnum is deprecated I, [2021-07-13T11:28:42.652554 #15324] INFO -- : Loaded vSphere 6.5 SDK /var/vcap/data/packages/vsphere_cpi/2165fa9d3bfa5cbb671373aa8bc784864aba7c02/lib/nsxt/nsxt_client/api/upgrade_node_upgrade_api.rb:145: warning: character class has '-' without escape: /^[a-zA-Z0-9-.]+$/ D, [2021-07-13T11:28:47.673061 #15324] DEBUG -- [req_id cpi-832905]: Running method 'RetrieveServiceContent'... D, [2021-07-13T11:28:47.711672 #15324] DEBUG -- [req_id cpi-832905]: Running method 'Login'... I, [2021-07-13T11:28:47.934969 #15324] INFO -- [req_id cpi-832905]: Finished info in 7.65 seconds D, [2021-07-13T11:28:47.935329 #15324] DEBUG -- [req_id cpi-832905]: Running method 'Logout'... , exit_status: pid 15324 exit 0
To verify if your BOSH task is stuck on a CPI request, perform the following steps.
1. ApplyChange has been hung for hours without progress. Identify the BOSH task ID from ApplyChange trace and retrieve the BOSH task debug log.
# verify the task is still "processing" bosh tasks # obtain task debug log. The command will not end because the task is still running. So let the command stream debug logs for a minute and then CTRL-C. bosh task TASK_ID --debug > task-debug.log
2. Filter out stuck CPI requests that have no response yet.
export TASK_DEBUG_LOG=task-debug.log # identify stuck CPI request IDs diff <(cat $TASK_DEBUG_LOG | egrep -o '\[cpi-[0-9]+\] request:' | egrep -o 'cpi-[0-9]+' | sort | uniq) <(cat $TASK_DEBUG_LOG | egrep -o '\[cpi-[0-9]+\] response:' | egrep -o 'cpi-[0-9]+' | sort | uniq) | egrep -o 'cpi-[0-9]+' > stuck-cpi-id.out # print the stuck CPI requests for stuck_cpi in $(cat stuck-cpi-id.out); do echo "=== stuck CPI request: ${stuck_cpi} ==="; cat $TASK_DEBUG_LOG | egrep "${stuck_cpi}\] request"; echo ""; done
popen3
, which permits the Director to get CPI output as it is created, rather than getting it all when the CPI terminates. (PR #2319)"# Review running CPI processes. The example result below shows four vsphere CPI processes ps -efw | grep cpi root 86 2 0 Aug26 ? 00:00:00 [acpi_thermal_pm] vcap 2133 1352 18 07:25 ? 00:00:07 /var/vcap/packages/vsphere_cpi/bin/vsphere_cpi /var/vcap/jobs/vsphere_cpi/config/cpi.json vcap 2134 1352 18 07:25 ? 00:00:06 /var/vcap/packages/vsphere_cpi/bin/vsphere_cpi /var/vcap/jobs/vsphere_cpi/config/cpi.json vcap 2137 1352 19 07:25 ? 00:00:07 /var/vcap/packages/vsphere_cpi/bin/vsphere_cpi /var/vcap/jobs/vsphere_cpi/config/cpi.json vcap 2138 1352 19 07:25 ? 00:00:07 /var/vcap/packages/vsphere_cpi/bin/vsphere_cpi /var/vcap/jobs/vsphere_cpi/config/cpi.json root 2155 31647 0 07:26 pts/0 00:00:00 grep --color=auto cpi # kill them all pgrep -f vsphere_cpi pkill --signal 9 -f vsphere_cpi
bosh task TASK_ID --debug > task-debug.log