Before we can start talking about what a deployment convergence is or the difference between intended state versus actual state, let us first review how this KB will be structured.
We will use a generic naming convention with all upper case characters and dashes to represent specific items. This will help us with understanding because the concept of deployment convergence applies to all BOSH deployments that have VMs. For example, a cloud foundry deployment, a mysql service instance deployment, or a zookeeper deployment - these are all susceptible to a deployment convergence. However in this KB we will not refer to them by those deployment names, instead we will refer to a deployment as
DEPLOYMENT-A.
This naming convention will be leveraged throughout and will apply to other components as well.
For example:
VM-A,
PROPERTY-A,
VALUE-A.
To further help with clarification, refer to the following example for how this will translate.
This:
The pivotal-mysql-86cea2e763e27f1272f3 deployment has a Virtual Machine called dedicated-mysql-broker/1947c757-1e74-4c19-a213-d2b5c45bdb82 with a property called syslog.port that has the value 36969.
We have updated the syslog.port property in the pivotal-mysql-86cea2e763e27f1272f3 deployment's manifest from 36969 to 46969 and deployed the manifest.
Is abstracted to:
The DEPLOYMENT-A deployment has a Virtual Machine called VM-A with a property called PROPERTY-A that has the value VALUE-A.
We have updated the PROPERTY-A property in the DEPLOYMENT-A deployment's manifest from VALUE-A to VALUE-B and deployed the manifest.
Maybe the update is a syslog port setting, maybe the update is a stemcell upgrade, or maybe the update is the rotation of a certificate. In any case, abstracting away the specifics helps us focus on the important concepts that are globally applicable to the concept of deployment convergence.
Options:
1. If we run the following:
bosh -d DEPLOYMENT-A recreate VM-C
Then the deployment
will converge. All the changes that were applied to
VM-A,
VM-B, and
VM-C will be reverted because the intended state wasn't updated due to the failed deploy of the updated
DEPLOYMENT-A-manifest.yml file.
2. As of BOSH v270.4.0 and BOSH CLI v6.0.0, the
start
,
stop
,
restart
, and
recreate
commands all support a
--no-converge
flag. If we run the following:
bosh -d DEPLOYMENT-A recreate VM-C --no-converge
Then the deployment
will not converge. All the changes that were applied to
VM-A,
VM-B, and
VM-C will persist and
VM-C will be recreated with its actual state (
PROPERTY-A with a value of
VALUE-B).
3. If we delete
VM-C from the IaaS and run the following:
bosh -d DEPLOYMENT-A cck
Then the deployment
will not converge. All the changes that were applied to
VM-A,
VM-B, and
VM-C will persist and
VM-C will be detected as missing and can be recreated with its actual state (
PROPERTY-A with a value of
VALUE-B).
4. If we delete
VM-C from the IaaS and let the BOSH resurrector detect and fix it, then the deployment
will not converge. All the changes that were applied to
VM-A,
VM-B, and
VM-C will persist and
VM-C will be detected as missing and will be recreated with its actual state (
PROPERTY-A with a value of
VALUE-B).
It is paramount to note that at this point, even if we are able to get
VM-C to recreate successfully, the deployment's intended state
still has not changed. DEPLOYMENT-A is still susceptible to a deployment convergence. In order to update the intended state of the deployment,
a successful deploy needs to occur. In our example, we should re-run the following:
bosh -d DEPLOYMENT-A deploy DEPLOYMENT-A-manifest.yml
Only after that command is successful will the intended state update in the BOSH database. Then we will no longer at risk of a deployment convergence for
DEPLOYMENT-A.
To reiterate why this is an important concept to understand, lets relate what we just learned to a large cloud foundry deployment. There can be hundreds of machines in this type of deployment.
1. A certificate rotation is needed so the manifest is updated with the new certs and deployed against the cloud foundry deployment.
2. 100 machines were updated and a failure occurs on the 101st machine due to a firewall issue.
3. In an effort to mitigate, we open the port in the IaaS that is necessary for the 101st machine's update to be successful.
4. We issue a
bosh -d CF recreate VM-1015. Our deployment will converge back to the intended state and undo all of the changes for the 100 machines that already updated.
This can be very time consuming and undesirable. The correct thing to do in that situation would be to first get the 101st machine in a running state without converging the deployment and then resume deploying the manifest. For example, if we added the
--no-converge to the
recreate command in step 4, then we would have only acted on that target instance, recreating it with its actual state. Once the 101st machine is recreated successfully, then we can proceed to redeploy the manifest. BOSH will detect the actual state of the 100 VMs and recognize it doesn't need to update them again and will continue the deploy where it left off when it failed on the initial deploy.
Note: The last sentence is only true if the
--recreate flag is not present in the
deploy command. If the
--recreate flag is present with the
deploy command, then BOSH will recreate the 100 machines even though they are already updated. In a case like this where you need to not recreate the first 100 machines,
BOSH ignore them for the deploy and
unignore them after the successful deploy.
Next, we should define what an intended state is and compare it to what actual state is.
The intended state for a deployment is what is defined in the manifest that was last successfully deployed for that deployment. When a BOSH deploy occurs, it is supplied a manifest file. Upon successfully deploying the manifest file, BOSH will write the manifest to its internal database and this becomes the intended state for that deployment.
The actual state for a deployment is the intended state in addition to any changes to the deployment that was brought about by a BOSH deploy.
To illustrate this, let us consider the following example:
DEPLOYMENT-A has 1 instance group with 3 instances desired. Each instance will be a VM so we will refer to them as
VM-A,
VM-B, and
VM-C.
DEPLOYMENT-A was successfully deployed.
To get the intended state of
DEPLOYMENT-A, we run the command:
bosh -d DEPLOYMENT-A manifest > DEPLOYMENT-A-manifest.yml
Lets now consider that we need to update a property in this deployment. To update the property, we make a change to the manifest and deploy it against the deployment.
We edit
DEPLOYMENT-A-manifest.yml and change
PROPERTY-A from
VALUE-A to
VALUE-B.
We then run:
bosh -d DEPLOYMENT-A deploy DEPLOYMENT-A-manifest.yml
During this deploy,
VM-A updates,
VM-B updates,
VM-C attempts to update but encounters an error and does not successfully update. Because of this error, the deployment of the manifest file is not considered successful and the deployment's intended state
does not change (remember that BOSH only updates its internal database upon successful deploys).
The actual state for the deployment however did change. This is key to understanding deployment convergence.
In this example, the following is true at this point:
The
intended state of DEPLOYMENT-A:
VM-A to have PROPERTY-A with a value of VALUE-A.
VM-B to have PROPERTY-A with a value of VALUE-A.
VM-C to have PROPERTY-A with a value of VALUE-A.
The
actual state of DEPLOYMENT-A:
VM-A to have PROPERTY-A with a value of VALUE-B.
VM-B to have PROPERTY-A with a value of VALUE-B.
VM-C to have PROPERTY-A with a value of VALUE-B.
Recall that
VM-C had an error during the update. It is important to understand that BOSH did attempt to update it. Therefore
VM-C will have the updated
VALUE-B for
PROPERTY-A in its actual state.
Now we can start to understand deployment convergence.
Anytime a BOSH update command is issued to a deployment, BOSH will try to converge the deployment to its intended state. BOSH update commands include the following:
bosh <start|stop|restart|recreate>
Lets continue this example further to explore difference scenarios where a deployment convergence will take place.