Activating the Ops Manager Root CA fails with a violation error 'active child certificate version is not the latest non transitional version or more than one active version'
search cancel

Activating the Ops Manager Root CA fails with a violation error 'active child certificate version is not the latest non transitional version or more than one active version'

book

Article ID: 401774

calendar_today

Updated On:

Products

VMware Tanzu Platform

Issue/Introduction

While following the procedure on rotating the Ops Manager Root CA, specifically upon running the curl command to call the "/activate" endpoint in the "Step 2: Activate the CAs" step, the following safety violation is encountered:

{"certificates":{"updated":[],"excluded":[],"update_failed":[]},"safety_violations":[{"violation":"active child certificate version is not the latest non transitional version or more than one active version exists","certificate_names":["/bosh_dns_health_client_tls","/bosh_dns_health_server_tls","/dns_api_client_tls","/dns_api_server_tls","/opsmgr/bosh_dns/san_migrated"]}],"errors":["failed to activate certificate authorities"]}

 

The solution for this is usually to upgrade all deployments so that all the leaf certs would be signed by the latest version of the Ops Manager CA cert.  However, even after all the deployments have been upgraded, the above error persists.

Environment

VMware Tanzu Platform

Cause

There is at least one deployment that is still recorded in Credhub to be having leaf certs that are signed by an old version of a CA cert.  This can be verified by running the `maestro topology` command and reviewing the output.

As an example, the leaf cert "/bosh_dns_health_client_tls", which is one of the certs named in the error message, has this section in the maestro topology output:

        - name: /bosh_dns_health_client_tls
          certificate_id: CERTID123
          signed_by: /opsmgr/bosh_dns/tls_ca
          versions:
            - version_id: yyy
              active: true
              signed_by_version: 1111
              deployment_names:
                - bosh-health
                - cf-5555
                - service-instance_6666
                - service-instance_7777
                - service-instance_8888
              generated: true
              valid_until: 2029-05-28T16:38:08Z
            - version_id: xxx
              active: true
              signed_by_version: 1111
              deployment_names:
                - service-instance_aaa
                - service-instance_bbb
                - service-instance_ccc
              generated: true
              valid_until: 2027-04-05T09:15:24Z

In the above, there are two active versions of the "/bosh_dns_health_client_tls" cert.  The version 'xxx' of that cert is being used by three deployments (service-instance_aaa, service-instance_bbb and service-instance_ccc) and it has a validity end date that is earlier than the other version, and so it could be suspected that this cert (version 'xxx') is the older cert and should no longer be used by any deployments if an Apply Changes (with upgrade all service instances errand enabled) was recently done.  The deployments, still using the old cert, could be failed deployments that doesn't have any VM's.  These could be service instance deployments that failed during their creation stage.

Resolution

If the service instance deployment does not have any VM's running, and if running `bosh manifest -d $deployment` doesn't have any output, then it is safe to assume that these are failed service instance deployments.  If these are failed deployments, then these can be deleted by running the following command on each one:

bosh -d $service_instance_name delete-deployment

Once those deployments are deleted, run the maestro topology again and the output should no longer show those deployments and then there should only be one active cert that has deployments.  If so, then run the step again to activate the Ops Manager Root CA to resume the procedure to rotate it.