Hard Root CA Rotation
search cancel

Hard Root CA Rotation

book

Article ID: 293658

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

When a Root CA expires or is deleted, all the VMs go into an "unresponsive agent" state. Apply Changes will fail on the first VM it tries to recreate.

Environment

Product Version: 2.6

Resolution

WARNING: For Tanzu Application Service (TAS) for VMs 2.7+ Please use the following article: https://community.pivotal.io/s/article/Hard-Root-CA-Rotation-for-TAS-2-7


The Hard Root CA Rotation

This article provides a set of instructions on how to perform a Hard Root CA Rotation on a platform where the Root CA has either expired or been deleted. 

This method is called "Hard" because it will cause some downtime on the platform during the period it recreates the VMs. Doing the rotation through the normal method is the "Soft" rotation because it prevents downtime. 

These steps can be used to recreate all the CF deployments VMs to use current Active Root CA. Run these commands in the OpsMan Box using a newly created /var/tempest/workspaces/default/deployments/cf-#####.yml deployment manifest. This will differ from "bosh recreate --fix" because it "deploys" the new manifest with the newly created Root CA.

For any unsupported PCF version, the first step is to upgrade Opsmanager at least to 2.3.10+ (2.3-build.250) or 2.4.4+ (2.4-build.152), to be able to rotate the NATS CA. If there are few consul_server instances,  scale consul_server to 1 during the process.
 

Prepare New Root Certificate

The first task is to create the new root certificates. The steps to create a new root CA can be found in this documentation: Ops Manager 2.6 and below - Rotating Certificates. Please select your specific version within the docs and double check these steps listed in here: Pivotal Platform 2.6 and below - Rotating Certificates.


Steps

1. Check for expired certificates.

2. Generate a new root CA.

3. Mark the new root CA as Active.

4. Regenerate the non-configurable certificates using Active CA.


Push The New Root CA to Deployment

The newly created root CA Cert needs to be incorporated into a new deployment manifest. This manifest will be used by BOSH to re-deploy.
 

Steps

1. Select the "BOSH Director Tile" and in the "Director Config" select the checkbox "Recreate all VM's".

Note: The "Recreate All VM's" checkbox resets after every successful "Apply Changes".

2. In the TAS tile, click on resource config and scale down the diego_database (diego bbs) down to 1. The high availability orchestration of the diego_database cluster, which uses a "locket" layer to "gossip" status updates to each other, similar to the galera layer on the MySQL cluster. Scaling down at the start will remove extra steps.

3. Click on "Review Pending Changes" and then "Apply Changes".

4. This Apply Changes will fail on the first TAS for VMs deployment VM recreate which is to be expected and is because of the invalid CA cert.

5. Modify the newly created cf-####.yml to be readable. This will be on the Ops Manager VM and can be modified using the following command:  "cd /var/tempest/workspaces/default/deployments/ && sudo chmod a+r *.yml"

6. Skip this step if you have a High Availability MySQL cluster, 3 or more, and use the following section on MySQL Clusters. In the deployments directory on the Opsman VM where the manifest resides (/var/tempest/workspaces/default/deployments/ ), run "bosh -d cf-#### deploy cf-####.yml --fix". This will recreate each VM and deploy the new CA cert. 


TAS for VMs MySQL Clusters

If you have a High Availability MySQL Cluster in your TAS for VMs deployment the first "deploy" run will fail on the TAS for VMs MySQL cluster. The method will failover and then bootstrap the cluster.

Steps

  • Log into the  "mysql-monitor" VM and run "mysql-diag" to verify the cluster is healthy prior to this step as we want to avoid any other complications. Please use the following documentation: https://docs.pivotal.io/pivotalcf/2-5/mysql/mysql-diag.html
  • Tell BOSH to ignore 2 of the 3 VM's for safety with the command "bosh ignore mysql/1 && bosh ignore mysql/2"
  • Run "bosh -d cf-#### deploy cf-####.yml --fix". This will fail immediately on mysql/0.
  • After the failure, ignore VM 1 with the command "bosh ignore mysql/0" and unignore VM 2 with the command "bosh unignore mysql/1"
  • Run "bosh -d cf-#### deploy cf-####.yml --fix".  This will fail immediately on mysql/1
  • After the failure, ignore VM 2 with the command "bosh ignore mysql/1" and un-ignore VM 3 with the command "bosh unignore mysql/2"
  • Run "bosh -d cf-#### deploy cf-####.yml --fix".  This will fail immediately on mysql/2
  • All three should now be in a "failed state", but will have the new root CA.
  • Bootstrap the cluster using the manual bootstrap method featured in our documentation (https://docs.pivotal.io/pivotalcf/mysql/bootstrap-mysql.html#manual-bootstrap).
  • Run "bosh unignore mysql" on all 3 MySQL VMs. This should leave them in a state where they can be processed through. 
  • All three should now be in "running state". This can be verified using the same mysql-diag command from earlier.
  • Run "bosh -d cf-#### deploy cf-####.yml --fix"
Warning: Do not modify the Manifest (at any point) to have one MySQL VM instead of the High Availability three. This is a much higher risk as it deletes two VMs prior to performing the recreate. We have seen issues with the IAAS and Attaching the Disk so we do not recommend this version. It is mentioned primarily so you know the reason you do not want to reduce to one VM.


TAS BOSH Nats Certs

After the completion of the bosh deployment you must run an Apply Changes on All tiles featured "recreate all vms" and "Recreate and Upgrade all service instance" errands . This will rotate the "NATS" certs on all the VMs. 


Service instance Tiles

Once the TAS for VMs deployment is happy, we move on to the service instances. This would be for tiles such as MySQL, Redis, RabbitMQ, and Spring. There may be other tiles not listed, the way to know is any that requires communication with the TAS for VMs deployment will need to be re-done.
 

Steps

  • Run an Apply Changes on the BOSH director and the Service tile with "Recreate All On-Demand Service Instances" and "Upgrade All On-Demand Service Instances" selected. Some service tiles have an Errand for this, such as MySQL and RabbitMQ. Make sure it is selected to Recreate (or "Update") everywhere applicable. 
  • If the Apply Changes fails, contact VMware Support

This should conclude the "Hard Certificate Rotation" and all the VM's should be showing in a "running" state.

Note: In a large TAS environment (100 VMs+), VMware Support can help to speed up the scan performed by Bosh director to all bosh-agents in "unresponsive" state, please contact VMware Support.