When trying to rotate certificates for a particular cluster using any variation of the following command:
tkgi rotate-certificates <clustername>
The rotation fails with the following error message:
Error: status 400 reading CredhubClient#regenerateCertificateById(String,CertificateRegenerateRequest,String)'
It is important to mention that this error can be caused by the manual use of maestro commands to rotate certificate authorities. In the TKGI documentation you can find warnings regarding this scenario.
In the document Rotate Kubernetes Cluster Certificates, you will find the following warning:
Never use the CredHub Maestro maestro regenerate ca/leaf –all command to rotate TKGI certificates.
If this is the scenario you are facing, please review article "Wrongly kicked "maestro regenerate ca/leaf --all" in TKGi" and open a case with the support team for troubleshooting.
This issue can occur when one or more certificate authorities used by the cluster have a preexisting certificate version that is marked as transitional. This has been observed when certificates are manually updated in Credhub, but the rotation process is not completed on the TKGI clusters.
Get the control plane deployment logs (pivotal-container-service deployment) running the following:
bosh -d pivotal-container-service-<ID> logs
In the log bundle obtained from the previous step, find the pks-api folder and inside look for file pks-api.log. You will find the following:
2025-XX-XX XX:XX:XX.XXX INFO 1661158 --- [https-jsse-nio-9021-exec-2] i.p.pks.bosh.credhub.CredhubService : Regenerate certificate credential: certificate: 1ef3ca6d-XXXX-XXXX-XXXX-XXXXXXXXad15, transitional: true, allowTransitionalParentToSign: false
202X-XX-XX XX:XX:XX.XXX ERROR XXXXXXX --- [https-jsse-nio-9021-exec-8] i.p.pks.bosh.credhub.CredhubService : Failed to regenerate certificate 400 {"error":"The maximum number of transitional versions for a given CA is 1."} status 400 reading CredhubClient#regenerateCertificateById(String,CertificateRegenerateRequest,String)
Double check this by reviewing the maestro topology from the support bundle. In the Support Bundle locate the "certificates" folder and inside, search the file called maestro_topology.yml
If the maestro topology shows that some, but not all the certificate authorities present have a new version that is marked as transitional you will need to identify the clusters affected.
There are two searches you can do here:
Using the certificate ID from the error found in the pks-api logs, search for the specific certificate causing the error.
If you can to check if there are any other certificates presenting the same behavior, search for the value "transitional: true". Below you can find an example.
- name: "/p-bosh/service-instance_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/etcd_ca_2018"
certificate_id: 12348578-0123-1a2b-3c4d-1a2b3c4d5e6f
signed_by: "/p-bosh/service-instance_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/etcd_ca_2018"
versions:
- version_id: z1y2x3w4-1234-9876-2468-a1b2c3d4e5f6
active: false
signed_by_version: ''
deployment_names: []
signing: false
transitional: true
certificate_authority: true
generated: true
valid_until: '2028-06-23T20:59:21Z'
Once you find the certificates, record the service instance IDs. Using the certificate version number, confirm that the transitional version is NOT signing any existing leaf certificates.
You can also check if the transitional version has been already pushed to the deployment VMs. To do this:
If the above criteria is met, manual Credhub operations will be required in order to solve the errors related to the preexisting transitional certificate versions. Please open a case with support.