The upgrade of Pivotal Cloud Foundry may fail due to Consul issues.
The upgrade fails with the following error message:
Started updating job consul_server-partition-260de9892e7d24109dfe > consul_server-partition-260de9892e7d24109dfe/0 (canary). Failed: `consul_server-partition-260de9892e7d24109dfe/0' is not running after update (00:05:57) Error 400007: `consul_server-partition-260de9892e7d24109dfe/0' is not running after update
This particular error message is a general error message. It indicates that there is a problem with the software running on the VM. For the purposes of this KB, we're talking about the consul_server VM in particular, so it means that there is a problem with the consul software starting up. It is not possible to tell the specific problem, see Debugging Instructions below for details on how you could investigate more.
In many cases, we have found that consul server failures in PCF can be corrected by wiping the data from the nodes and resetting them. This process essentially gives the cluster a fresh start and because there is no persistent data stored on the Consul server, the operation is harmless.
Because this process is quick, non-destructive and has a high success rate for fixing Consul problems, Pivotal recommends trying this process first, before doing any additional debugging.
To perform this process, follow the instructions in the Failed Deploys, Upgrades, Split-Brain Scenarios, etc section of the following link.
https://github.com/cloudfoundry-incubator/consul-release/tree/master#failure-recovery
If you need assistance with these instructions, please open a support ticket. If performing the steps at the link above does not help, please proceed to the next section.
Debugging Instructions
When this problem occurs, you can debug further by performing the following steps:
Once you have captured the information above, you can review the information to better understand the problem or open a support ticket and Pivotal Support will help to diagnose the issue.