When
bosh recreate worker instance in a Concourse deployment, sometimes it would take over one hour to complete. BOSH task debug logs like following could be observed.
I, [2022-05-19T23:27:35.958921 #22159] [canary_update(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1))] INFO -- DirectorJobRunner: Updating instance worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1), changes: "recreate"
I, [2022-05-19T23:27:35.984125 #22159] [canary_update(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1))] INFO -- DirectorJobRunner: Running pre-stop for worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1)
I, [2022-05-19T23:27:37.025683 #22159] [canary_update(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1))] INFO -- DirectorJobRunner: Running drain for worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1)
I, [2022-05-20T00:27:39.618791 #22159] [canary_update(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1))] INFO -- DirectorJobRunner: Stopping instance worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1)
I, [2022-05-20T00:27:44.639530 #22159] [canary_update(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1))] INFO -- DirectorJobRunner: Running post-stop for worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1)
I, [2022-05-20T00:27:44.643924 #22159] [canary_update(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1))] INFO -- DirectorJobRunner: Snapshots are disabled; skipping
I, [2022-05-20T00:27:44.651135 #22159] [canary_update(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1))] INFO -- DirectorJobRunner: Deleting VM
I, [2022-05-20T00:28:24.854123 #22159] [create_missing_vm(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1)/1)] INFO -- DirectorJobRunner: Creating missing VM
I, [2022-05-20T00:28:24.915468 #22159] [create_missing_vm(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1)/1)] INFO -- DirectorJobRunner: Creating VM
I, [2022-05-20T00:29:38.495637 #22159] [create_missing_vm(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1)/1)] INFO -- DirectorJobRunner: deleting arp entries for the following ip addresses: ["x.x.x.x"]
I, [2022-05-20T00:29:49.874301 #22159] [canary_update(worker/3c66f787-19bd-4fb2-8cf2-37ca03c9ce80 (1))] INFO -- DirectorJobRunner: Updating persistent disk
As shown in debug logs, the drain task took one hour and prolonged whole recreating process. Sometimes drain script on worker could take a long time to drain the workload to other workers. So a timeout parameter is specified in drain command to avoid it running forever. By default the timeout set to 3600 seconds.