While following the procedure on rotating the Ops Manager Root CA, specifically upon running the curl command to call the "/activate" endpoint in the "Step 2: Activate the CAs" step, the following safety violation is encountered:
{"certificates":{"updated":[],"excluded":[],"update_failed":[]},"safety_violations":[{"violation":"active child certificate version is not the latest non transitional version or more than one active version exists","certificate_names":["/bosh_dns_health_client_tls","/bosh_dns_health_server_tls","/dns_api_client_tls","/dns_api_server_tls","/opsmgr/bosh_dns/san_migrated"]}],"errors":["failed to activate certificate authorities"]}
The solution for this is usually to upgrade all deployments so that all the leaf certs would be signed by the latest version of the Ops Manager CA cert. However, even after all the deployments have been upgraded, the above error persists.
VMware Tanzu Platform
There is at least one deployment that is still recorded in Credhub to be having leaf certs that are signed by an old version of a CA cert. This can be verified by running the `maestro topology` command and reviewing the output.
As an example, the leaf cert "/bosh_dns_health_client_tls", which is one of the certs named in the error message, has this section in the maestro topology output:
- name: /bosh_dns_health_client_tls
certificate_id: CERTID123
signed_by: /opsmgr/bosh_dns/tls_ca
versions:
- version_id: yyy
active: true
signed_by_version: 1111
deployment_names:
- bosh-health
- cf-5555
- service-instance_6666
- service-instance_7777
- service-instance_8888
generated: true
valid_until: 2029-05-28T16:38:08Z
- version_id: xxx
active: true
signed_by_version: 1111
deployment_names:
- service-instance_aaa
- service-instance_bbb
- service-instance_ccc
generated: true
valid_until: 2027-04-05T09:15:24Z
In the above, there are two active versions of the "/bosh_dns_health_client_tls" cert. The version 'xxx' of that cert is being used by three deployments (service-instance_aaa, service-instance_bbb and service-instance_ccc) and it has a validity end date that is earlier than the other version, and so it could be suspected that this cert (version 'xxx') is the older cert and should no longer be used by any deployments if an Apply Changes (with upgrade all service instances errand enabled) was recently done. The deployments, still using the old cert, could be failed deployments that doesn't have any VM's. These could be service instance deployments that failed during their creation stage.
If the service instance deployment does not have any VM's running, and if running `bosh manifest -d $deployment` doesn't have any output, then it is safe to assume that these are failed service instance deployments. If these are failed deployments, then these can be deleted by running the following command on each one:
bosh -d $service_instance_name delete-deployment
Once those deployments are deleted, run the maestro topology again and the output should no longer show those deployments and then there should only be one active cert that has deployments. If so, then run the step again to activate the Ops Manager Root CA to resume the procedure to rotate it.