Understanding Bosh Deployment Convergence
search cancel

Understanding Bosh Deployment Convergence

book

Article ID: 293436

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

BOSH deployment convergence is an important topic to understand. This KB aims to clear up what it is and when it occurs.

At a high-level, a deployment convergence is BOSH's attempt to bring a deployment back to its intended state.

The official bosh docs covering convergence can be found here

In this KB, we will walk through a demonstration of a failed BOSH deploy scenario and review when a convergence would or would not be triggered.

Resolution

Before we can start talking about what a deployment convergence is or the difference between intended state versus actual state, let us first review how this KB will be structured.

We will use a generic naming convention with all upper case characters and dashes to represent specific items. This will help us with understanding because the concept of deployment convergence applies to all BOSH deployments that have VMs. For example, a cloud foundry deployment, a mysql service instance deployment, or a zookeeper deployment - these are all susceptible to a deployment convergence. However in this KB we will not refer to them by those deployment names, instead we will refer to a deployment as DEPLOYMENT-A.

This naming convention will be leveraged throughout and will apply to other components as well.

For example: VM-A, PROPERTY-A, VALUE-A.

To further help with clarification, refer to the following example for how this will translate.


This:

The pivotal-mysql-86cea2e763e27f1272f3 deployment has a Virtual Machine called dedicated-mysql-broker/1947c757-1e74-4c19-a213-d2b5c45bdb82 with a property called syslog.port that has the value 36969.

We have updated the syslog.port property in the pivotal-mysql-86cea2e763e27f1272f3 deployment's manifest from 36969 to 46969 and deployed the manifest.


Is abstracted to:

The DEPLOYMENT-A deployment has a Virtual Machine called VM-A with a property called PROPERTY-A that has the value VALUE-A.

We have updated the PROPERTY-A property in the DEPLOYMENT-A deployment's manifest from VALUE-A to VALUE-B and deployed the manifest. 

Maybe the update is a syslog port setting, maybe the update is a stemcell upgrade, or maybe the update is the rotation of a certificate. In any case, abstracting away the specifics helps us focus on the important concepts that are globally applicable to the concept of deployment convergence.

Options:
1. If we run the following:
bosh -d DEPLOYMENT-A recreate VM-C
Then the deployment will converge. All the changes that were applied to VM-A, VM-B, and VM-C will be reverted because the intended state wasn't updated due to the failed deploy of the updated DEPLOYMENT-A-manifest.yml file.

2. As of BOSH v270.4.0 and BOSH CLI v6.0.0, the startstoprestart, and recreate commands all support a --no-converge flag. If we run the following:
bosh -d DEPLOYMENT-A recreate VM-C --no-converge
Then the deployment will not converge. All the changes that were applied to VM-A, VM-B, and VM-C will persist and VM-C will be recreated with its actual state (PROPERTY-A with a value of VALUE-B).

3. If we delete VM-C from the IaaS and run the following:
bosh -d DEPLOYMENT-A cck
Then the deployment will not converge. All the changes that were applied to VM-A, VM-B, and VM-C will persist and VM-C will be detected as missing and can be recreated with its actual state (PROPERTY-A with a value of VALUE-B).

4. If we delete VM-C from the IaaS and let the BOSH resurrector detect and fix it, then the deployment will not converge. All the changes that were applied to VM-A, VM-B, and VM-C will persist and VM-C will be detected as missing and will be recreated with its actual state (PROPERTY-A with a value of VALUE-B).

It is paramount to note that at this point, even if we are able to get VM-C to recreate successfully, the deployment's intended state still has not changed. DEPLOYMENT-A is still susceptible to a deployment convergence. In order to update the intended state of the deployment, a successful deploy needs to occur. In our example, we should re-run the following: 
bosh -d DEPLOYMENT-A deploy DEPLOYMENT-A-manifest.yml
Only after that command is successful will the intended state update in the BOSH database. Then we will no longer at risk of a deployment convergence for DEPLOYMENT-A.

To reiterate why this is an important concept to understand, lets relate what we just learned to a large cloud foundry deployment. There can be hundreds of machines in this type of deployment.

1. A certificate rotation is needed so the manifest is updated with the new certs and deployed against the cloud foundry deployment.

2. 100 machines were updated and a failure occurs on the 101st machine due to a firewall issue.

3. In an effort to mitigate, we open the port in the IaaS that is necessary for the 101st machine's update to be successful. 

4. We issue a bosh -d CF recreate VM-101

5. Our deployment will converge back to the intended state and undo all of the changes for the 100 machines that already updated. 

This can be very time consuming and undesirable. The correct thing to do in that situation would be to first get the 101st machine in a running state without converging the deployment and then resume deploying the manifest. For example, if we added the --no-converge to the recreate command in step 4, then we would have only acted on that target instance, recreating it with its actual state. Once the 101st machine is recreated successfully, then we can proceed to redeploy the manifest. BOSH will detect the actual state of the 100 VMs and recognize it doesn't need to update them again and will continue the deploy where it left off when it failed on the initial deploy.

Note: The last sentence is only true if the --recreate flag is not present in the deploy command. If the --recreate flag is present with the deploy command, then BOSH will recreate the 100 machines even though they are already updated. In a case like this where you need to not recreate the first 100 machines, BOSH ignore them for the deploy and unignore them after the successful deploy.

Next, we should define what an intended state is and compare it to what actual state is.

The intended state for a deployment is what is defined in the manifest that was last successfully deployed for that deployment. When a BOSH deploy occurs, it is supplied a manifest file. Upon successfully deploying the manifest file, BOSH will write the manifest to its internal database and this becomes the intended state for that deployment.

The actual state for a deployment is the intended state in addition to any changes to the deployment that was brought about by a BOSH deploy.

To illustrate this, let us consider the following example:

DEPLOYMENT-A has 1 instance group with 3 instances desired. Each instance will be a VM so we will refer to them as VM-A, VM-B, and VM-C.

DEPLOYMENT-A was successfully deployed.

To get the intended state of DEPLOYMENT-A, we run the command:
bosh -d DEPLOYMENT-A manifest > DEPLOYMENT-A-manifest.yml

Lets now consider that we need to update a property in this deployment. To update the property, we make a change to the manifest and deploy it against the deployment.

We edit DEPLOYMENT-A-manifest.yml and change PROPERTY-A from VALUE-A to VALUE-B

We then run:
bosh -d DEPLOYMENT-A deploy DEPLOYMENT-A-manifest.yml

During this deploy, VM-A updates, VM-B updates, VM-C attempts to update but encounters an error and does not successfully update. Because of this error, the deployment of the manifest file is not considered successful and the deployment's intended state does not change (remember that BOSH only updates its internal database upon successful deploys).

The actual state for the deployment however did change. This is key to understanding deployment convergence. 

In this example, the following is true at this point:

The intended state of DEPLOYMENT-A:
VM-A to have PROPERTY-A with a value of VALUE-A.
VM-B to have PROPERTY-A with a value of VALUE-A.
VM-C to have PROPERTY-A with a value of VALUE-A.

The actual state of DEPLOYMENT-A:
VM-A to have PROPERTY-A with a value of VALUE-B.
VM-B to have PROPERTY-A with a value of VALUE-B.
VM-C to have PROPERTY-A with a value of VALUE-B.

Recall that VM-C had an error during the update. It is important to understand that BOSH did attempt to update it. Therefore VM-C will have the updated VALUE-B for PROPERTY-A in its actual state. 

Now we can start to understand deployment convergence.

Anytime a BOSH update command is issued to a deployment, BOSH will try to converge the deployment to its intended state. BOSH update commands include the following:
bosh <start|stop|restart|recreate>

Lets continue this example further to explore difference scenarios where a deployment convergence will take place.