Deploying:
Creating instance 'bosh/0'
Post "https://vcap:<redacted>@x.x.x.x:6868/agent": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2025-XX-XXTXX:XX:XXZ is after 2025-XX-XXTXX:XX:XXZ
Exit code 1
The method is called "Hard" because it will cause some downtime on the platform during the period it recreates the VMs. Doing the rotation through the normal method is the "Soft" rotation because it prevents downtime. These steps can be used to recreate all unresponsive BOSH-deployed VMs to use current active root/NATS CA.
Run these commands in the Ops Manager box using a newly created /var/tempest/workspaces/default/deployments/cf-#####.yml deployment manifest. This will differ from "bosh recreate --fix" because it "deploys" the new manifest with the newly created Root CA.
These commands will reference OPS-MANAGER-FQDN. This stands for "Ops Manager Fully Qualified Domain Name". For example: https://ops-manager-url.com.
The first task is to create the new root certificates. The steps to create a new CA can be found in this documentation.
1. Disable the bosh resurrector by running "bosh update-resurrection off" .
- As we bring vm's online we do not want the system to automatically recreate any vm.
2. Prior to being able to utilize the "curl" you will need to get UAA token with uaac.
Target the UAAC Implementation:
uaac target https://OPS-MANAGER-FQDN/uaa
Authenticate with Operations Manager admin account:
$ uaac token owner get
#Example Output
Client ID: opsman
Client secret:
User name: admin <--- Your Opsman Login with Administrator scopes
Password: {Password}
Grab your access token and make a variable named $token:
export token=`uaac context | grep access_token | awk '{print $2}'`
5. Mark the new root CA as Active.
6. Regenerate the non-configurable certificates using Active CA.
The newly created Root CA Cert needs to be incorporated into a new deployment manifest. This manifest will be used by BOSH to re-deploy.
1. Select the "BOSH Director Tile" and in the "Director Config" select the checkbox "Recreate all VM's".
Note: The "Recreate All VM's" checkbox resets after every successful "Apply Changes".
2. Select the "Elastic Application Runtime" and in the "Resource Config" scale the Diego BBS (diego_database) down to one. The high availability orchestration of the diego_database cluster, which uses a "locket" layer to "gossip" status updates to each other, is similar to the "galera" layer on the MySQL cluster. This cluster differs however in that it pulls its information from the mysql cluster on scale back up so we don't need to preserve any resources. Scaling down at the start will remove extra steps later on. If you forget this step and it fails on the manual deploy in BOSH to follow you can repair by editing the manifest (cf-###.yml) from "instances:3" to "instances:1" for diego_database and then run "deploy" again.
3. Click on "Review Pending Changes" and then "Apply Changes".
4. This Apply Changes will fail on the first EAR deployment VM recreate which is to be expected due to the invalid CA cert.
5. Fetch EAR deployment manifest with "bosh -d cf-#### manifest > cf-####.yml"
Note: Skip step 12 if you have a High Availability MySQL cluster (three or more) and use the following section on EAR MySQL Clusters of 3 or more
6. In the deployments directory on the Opsman VM where the manifest resides (/var/tempest/workspaces/default/deployments/) run "bosh -d cf-#### deploy cf-####.yml --fix". This will recreate each VM and deploy the new CA cert.
7. After the completion of the manual BOSH deployment you must run an Apply Changes. This will confirm all components are up to date.
If you have a high availability MySQL cluster in EAR deployment, the first "deploy" run will fail at EAR MySQL cluster. The "monit" process startup will fail and resolve to a single process "localhost". The solution to this is to combine the "deploy" method with selective VM "ignore" / "deploy" (which creates the VM). The newly created VM will feature the new Root CA certificate. This method is designed to provide a safety net in case something goes wrong with VM recreation.
Steps:
All 3 should now be in a "failed" state. We need to find the clusters leader.
All 3 should now be in "running" state. This can be verified using the same mysql-diag command from earlier. Please re-engage with step 13 from earlier.
Warning: Do not modify the Manifest (at any point) to have 1 MySQL VM instead of the high availability 3. This is a much higher risk as it deletes 2 VMs disk prior to performing the recreate. We have seen issues with the IAAS and attaching the disk to the first MySQL VM so we do not recommend this version. It is mentioned primarily so you know the reason you do not want to reduce to 1 VM. We prefer to do the delete in controlled circumstances.
Once the EAR deployment completes, we move on to the service instances. This would be for tiles such as MySQL, Redis, RabbitMQ, TKGI and Spring. There may be other tiles not listed, the way to know is any that requires communication with the EAR deployment will need to be re-done.
1. Run Apply Changes and the select the Service tile with "Upgrade All On-Demand Service Instances" on. Some service tiles have an Errand for this, such as MySQL and RabbitMQ. Make sure it is selected to these errands on all tiles that have a bosh deployment.
2. The deployment is supposed to fail when NATS CA expired because VMs are at unresponsive state, please follow the similar steps as EAR. "bosh deploy" with --fix will resolve unresponsive VMs.
"Upgrade All On-Demand Service Instances" may fail due to the same reason with service instance VMs. Please resolve it by using "bosh deploy" with --fix as well.
Finally please turn on resurrection when the CA is successfully rotated - "bosh update-resurrection on".
If you encounter any problem, please contact Broadcom support by opening a support request.