WARNING: For Tanzu Application Service (TAS) for VMs 2.7+ Please use the following article: https://community.pivotal.io/s/article/Hard-Root-CA-Rotation-for-TAS-2-7
This article provides a set of instructions on how to perform a Hard Root CA Rotation on a platform where the Root CA has either expired or been deleted.
This method is called "Hard" because it will cause some downtime on the platform during the period it recreates the VMs. Doing the rotation through the normal method is the "Soft" rotation because it prevents downtime.
These steps can be used to recreate all the CF deployments VMs to use current Active Root CA. Run these commands in the OpsMan Box using a newly created /var/tempest/workspaces/default/deployments/cf-#####.yml
deployment manifest. This will differ from "bosh recreate --fix
" because it "deploys"
the new manifest with the newly created Root CA.
For any unsupported PCF version, the first step is to upgrade Opsmanager at least to 2.3.10+ (2.3-build.250) or 2.4.4+ (2.4-build.152), to be able to rotate the NATS CA. If there are few consul_server instances, scale consul_server to 1 during the process.
The first task is to create the new root certificates. The steps to create a new root CA can be found in this documentation: Ops Manager 2.6 and below - Rotating Certificates. Please select your specific version within the docs and double check these steps listed in here: Pivotal Platform 2.6 and below - Rotating Certificates.
1. Check for expired certificates.
2. Generate a new root CA.
3. Mark the new root CA as Active.
4. Regenerate the non-configurable certificates using Active CA.
The newly created root CA Cert needs to be incorporated into a new deployment manifest. This manifest will be used by BOSH to re-deploy.
1. Select the "BOSH Director Tile" and in the "Director Config" select the checkbox "Recreate all VM's".
Note: The "Recreate All VM's" checkbox resets after every successful "Apply Changes".
2. In the TAS tile, click on resource config and scale down the diego_database (diego bbs) down to 1. The high availability orchestration of the diego_database
cluster, which uses a "locket" layer to "gossip" status updates to each other, similar to the galera layer on the MySQL cluster. Scaling down at the start will remove extra steps.
3. Click on "Review Pending Changes" and then "Apply Changes".
4. This Apply Changes will fail on the first TAS for VMs deployment VM recreate which is to be expected and is because of the invalid CA cert.
5. Modify the newly created cf-####.yml
to be readable. This will be on the Ops Manager VM and can be modified using the following command: "cd /var/tempest/workspaces/default/deployments/ && sudo chmod a+r *.yml
"
6. Skip this step if you have a High Availability MySQL cluster, 3 or more, and use the following section on MySQL Clusters. In the deployments directory on the Opsman VM where the manifest resides (/var/tempest/workspaces/default/deployments/
), run "bosh -d cf-#### deploy cf-####.yml --fix
". This will recreate each VM and deploy the new CA cert.
If you have a High Availability MySQL Cluster in your TAS for VMs deployment the first "deploy" run will fail on the TAS for VMs MySQL cluster. The method will failover and then bootstrap the cluster.
mysql-monitor
" VM and run "mysql-diag" to verify the cluster is healthy prior to this step as we want to avoid any other complications. Please use the following documentation: https://docs.pivotal.io/pivotalcf/2-5/mysql/mysql-diag.htmlbosh -d cf-#### deploy cf-####.yml --fix
". This will fail immediately on mysql/0.bosh -d cf-#### deploy cf-####.yml --fix
". This will fail immediately on mysql/1bosh -d cf-#### deploy cf-####.yml --fix
". This will fail immediately on mysql/2failed state
", but will have the new root CA.running state
". This can be verified using the same mysql-diag
command from earlier.bosh -d cf-#### deploy cf-####.yml --fix
"After the completion of the bosh deployment you must run an Apply Changes on All tiles featured "recreate all vms" and "Recreate and Upgrade all service instance" errands . This will rotate the "NATS" certs on all the VMs.
Once the TAS for VMs deployment is happy, we move on to the service instances. This would be for tiles such as MySQL, Redis, RabbitMQ, and Spring. There may be other tiles not listed, the way to know is any that requires communication with the TAS for VMs deployment will need to be re-done.
This should conclude the "Hard Certificate Rotation" and all the VM's should be showing in a "running" state.
Note: In a large TAS environment (100 VMs+), VMware Support can help to speed up the scan performed by Bosh director to all bosh-agents in "unresponsive" state, please contact VMware Support.