Under /var/vcap/jobs/cloud_controller_clock/config/certs/
of the clock_global virtual machine (VM), the uaa_ca.crt
is empty.
Observe the file size from the below ls
command:
/var/vcap/jobs/cloud_controller_clock/config/certs# ls -la total 32 drwxr-x--- 2 root vcap 4096 Jan 14 18:11 . drwxr-x--- 3 root vcap 4096 Jan 14 18:11 .. -rw-r----- 1 root vcap 1209 Jan 14 18:07 credhub_ca.crt -rw-r----- 1 root vcap 1 Jan 14 18:07 db_ca.crt -rw-r----- 1 root vcap 1209 Jan 14 18:07 mutual_tls_ca.crt -rw-r----- 1 root vcap 1327 Jan 14 18:07 mutual_tls.crt -rw-r----- 1 root vcap 1680 Jan 14 18:07 mutual_tls.key -rw-r----- 1 root vcap 2 Jan 14 18:07 uaa_ca.crt
This causes the cloud_controller_clock to fail to retrieve the User Account and Authentication (UAA) token. Eventually the cc.diego.sync.processes
fails as well.
{"timestamp":1548072501.0820189,"message":"error-updating-lrp-state","log_level":"error","source":"cc.diego.sync.processes","data":{"error":"OpenSSL::X509::StoreError","error_message":""},"thread_id":47217024603260, "fiber_id":47217024554400,"process_id":9615,"file":"/var/vcap/data/packages/cloud_controller_ng/1e4b5398d290f36d4f16bf9d7eaea36362084be2/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb","lineno":89,"method":"block in process_workpool_exceptions"}
As a result, when performing the commands, cf delete
or cf push
on an application, the old application container may still exist in the Diego Cell. Their route info will still be submitted to the Gorouter through the router-emitter.
This can cause Gorouters to route some requests to old containers and this leads to unexpected behavior
When Transmission Control Protocol (TCP) routes are not used, the Cloud Controller API (CAPI) sync job (which runs in the cloud_controller_clock job) talks to the Diego Database directly. As a result, UAA cert is not required.
However when TCP routes are used, it needs to talk with the Routing API to determine whether the request needs TCP routing.
This is because the cloud_controller_clock job uses the same network library as the cloud_controller. In addition, the code path for the Routing API is the same.
As a result, when TCP routes are used, all requests, including internal, external, HTTP, or TCP, will always go through the Routing API for necessary checks.
cloud_controller_clock talks with the Routing API with a token granted by UAA, therefore it needs the proper certificate authority (CA) for UAA.
The problem is that the default release for the cloud_controller_ng job has the necessary UAA CA, while the cloud_controller_clock does not.
A temporary workaround is to copy the BOSH root CA certificate into the clock_global VM.
The original location is:
Operations Manager VM - /var/tempest/workspaces/default/root_ca_certificate
The target location is:
Clock Global VM - /var/vcap/jobs/cloud_controller_clock/config/certs/
The permanent fix will be released in PAS 2.2.12.