This issue is a result of an application being pushed with an old and unsupported stack. In other words, the stack is not present in the Cloud Controllers diego.lifecycle_bundles config property.
Observe that the clock_global’s diego_sync operation, which is scheduled to run every 30 seconds, fails resulting in the Cloud Controller and Diego becoming out of sync for unrelated apps.
Unrelated applications will continue to run and respond, even after they have been stopped and the routes are deleted. In other words, these unrelated applications will continue to run until the suspicious application attempting to use the invalid stack is stopped.
The cloud_controller_clock.log
will produce the following error message:
{"timestamp":1541371393.2987807,"message":"no compiler defined for requested stack (VCAP::CloudController::Diego::LifecycleBundleUriGenerator::InvalidStack)","log_level":"error","source":"cc.clock","data":{},"thread_id":70188762546500,"fiber_id":70188762534580,"process_id":11619,"file":"/var/vcap/data/packages/cloud_controller_ng/6959cae050300f69b297770b552feaba6d2006fe/cloud_controller_ng/lib/cloud_controller/clock/scheduler.rb","lineno":37,"method":"block in start"}
The cloud_controller_clock_ctl.log
will produce the following error message:
[2018-11-04 22:45:13+0000] I, [2018-11-04T22:43:09.001369 #11619] INFO -- : Triggering 'diego_sync.job' [2018-11-04 22:45:13+0000] E, [2018-11-04T22:43:13.298586 #11619] ERROR -- : no compiler defined for requested stack (VCAP::CloudController::Diego::LifecycleBundleUriGenerator::InvalidStack)
An application is pushed with a stack that no long exists in the lifecyle_bundles config property for the buildpack or the docker application lifecycle. The diego_sync task continues to queue, run, and fail every 30 seconds, logging the error footprint mentioned above. This results in an inconsistent state for any application instance that requires a sync update between the Cloud Controller and Diego.
cc.diego.lifecycle_bundles
.
cc.diego.lifecycle_bundles
, first obtain stack information through either the CF CLI, the ERT/PAS deployment manifest, or the cloud_controller (CCDB) database.$ cf stacks Getting stacks in org ORG / space SPACE as admin... OK name description cflinuxfs2 Cloud Foundry Linux-based filesystem windows2012R2 Microsoft Windows / .Net 64 bit windows2016 Microsoft Windows 2016 lucid64 Ubuntu 10.04 on x86-64
ubuntu@lab13:~$ sudo cat /var/tempest/workspaces/default/deployments/cf-<deployment>.yml | grep -A 10 stacks | egrep 'name|description' - name: cflinuxfs2 description: Cloud Foundry Linux-based filesystem - name: windows2012R2 description: Microsoft Windows / .Net 64 bit - name: windows2016 description: Microsoft Windows 2016 - name: lucid64 description: Ubuntu 10.04 on x86-64
bosh ssh mysql/0
mysql/0079a88c-e317-440a-bf4e-01c28b01b152:~$ mysql --defaults-file=/var/vcap/jobs/mysql/config/mylogin.cnf -h 127.0.0.1 --execute="select guid, name, description from ccdb.stacks;" +--------------------------------------+---------------+--------------------------------------+ | guid | name | description | +--------------------------------------+---------------+--------------------------------------+ | 0765c714-474a-4dec-b9d7-835e76cb94ab | cflinuxfs2 | Cloud Foundry Linux-based filesystem | | 238e8bd6-1066-472c-9616-961c387c5b10 | windows2012R2 | Microsoft Windows / .Net 64 bit | | 3a254834-2486-42dc-8313-41a58fe9b48f | windows2016 | Microsoft Windows 2016 | | 9d954a46-d934-4088-8344-1cdcf99cc066 | lucid64 | Ubuntu 10.04 on x86-64 | +--------------------------------------+---------------+--------------------------------------+ mysql/0079a88c-e317-440a-bf4e-01c28b01b152:~$
cc.diego.lifecycle_bundles
information:cc.diego.lifecycle_bundles
information from the ERT/PAS deployment file:ubuntu@lab:~$ sudo cat /var/tempest/workspaces/default/deployments/cf-<deployment>.yml | sed -n '/name\: cloud_controller_ng/,/name\: cloud_controller_clock/p' | sed -n '/lifecycle_bundles/,/docker/{/lifecycle_bundles/b;/docker/b;p}' | tr -d " " | cut -d : -f 1 buildpack/cflinuxfs2 buildpack/windows2012R2 buildpack/windows2016
Alternatively, BOSH SSH to any cloud_controller job. Obtain the cc.diego.lifecycle_bundles
through the cloud_controller.yml
(stack information is also available in the stacks.yml
in the /var/vcap/jobs/cloud_controller_ng/config/
directory.
lucid64
stack exists but is not in the lifecycle_bundles information.lucid64
using CCDB or CF CLI.
lucid64
with CCDB, connect to one of the MySQL VM's through BOSH CLI as previously described and run following query:Query ccdb.buildpack_lifecycle_data mysql/0079a88c-e317-440a-bf4e-01c28b01b152: $ mysql --defaults-file=/var/vcap/jobs/mysql/config/mylogin.cnf -h 127.0.0.1 ccdb --execute="select o.name as org, s.name as space, a.name as app, a.guid as app_guid, b.stack, a.desired_state from apps a inner join spaces s on s.guid = a.space_guid inner join ccdb.buildpack_lifecycle_data b on a.guid = b.app_guid inner join organizations o on o.id = s.organization_id where b.stack = 'lucid64';" +------------+----------+-------------+--------------------------------------+---------+---------------+ | org | space | app | app_guid | stack | desired_state | +------------+----------+-------------+--------------------------------------+---------+---------------+ | cf-org | cf-space | lucid-test | 70b572c7-3b5c-4029-ac9a-0330c99bddd1 | lucid64 | STARTED | +------------+----------+-------------+--------------------------------------+---------+---------------+
lucid64
with CF CLI, execute the following command: $ cf curl /v2/stacks | jq '.resources[] | select(.entity.name | contains("lucid64"))' { "metadata": { "guid": "9d954a46-d934-4088-8344-1cdcf99cc066", "url": "/v2/stacks/9d954a46-d934-4088-8344-1cdcf99cc066", "created_at": "2018-12-07T21:20:01Z", "updated_at": "2018-12-07T21:20:01Z" }, "entity": { "name": "lucid64", "description": "Ubuntu 10.04 on x86-64" } } $ cf curl /v2/apps | jq '.resources[].entity | select(.stack_guid == "9d954a46-d934-4088-8344-1cdcf99cc066") | {"Appname": .name, "Space": .space_guid'} { "Appname": "lucid-test", "Space": "25d2d62f-ca16-4931-aa10-38f68b9211ea" }3. Use CF CLI to target the ORG and SPACE that the suspect application has been pushed from and stop the application.
$ cf stop lucid-test4. Confirm the application has been successfully stopped.
$ cf apps Getting apps in org cf-org / space cf-space as admin... OK name requested state instances memory disk urls lucid-test stopped 0/1 1G 1G lucid-test.cfapps-13.haas-59.pez.pivotal.io
Once the application pushed using the invalid stack has been stopped, diego_sync will automatically resume as expected and Diego will restore sync with the Cloud Controller.
If key performance indicator (KPI) metrics are setup for the platform, bbs.Domain
.
cf-apps
and bbs.Domain.cf-task
metrics will not be emitted if Diego is not in sync. However, after the application is stopped and Diego has restored sync operations, the KPI metrics will be emitted. For example:
$ cf nozzle -n | grep Domain origin:"bbs" eventType:ValueMetric timestamp:1544641632223657260 deployment:"cf" job:"diego_database" index:"f16a141f-7907-4373-a3cd-d72e7dd874d3" ip:"10.193.78.24" tags:<key:"instance_id" value:"f16a141f-7907-4373-a3cd-d72e7dd874d3" > tags:<key:"source_id" value:"bbs" > valueMetric:<name:"Domain.cf-apps" value:1 unit:"Metric"