Failed Updating Instance diego_cell: Updating certificates with retries
search cancel

Failed Updating Instance diego_cell: Updating certificates with retries

book

Article ID: 297740

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Symptoms:

When attempting an upgrade build of Elastic Runtime, Diego cells are not rolling properly:

Error Message:

Failed updating instance diego_cell > diego_cell/80bf75c3-caf9-4087-b0dd-f3b5571324fd (27): Timed out sending 'get_task' to 0f410cca-a50a-42d9-aa4a-a2599ad329e9 after 45 seconds (00:13:33) 
Failed updating instance diego_cell > diego_cell/8c595349-5de1-4c29-8c7f-769c2c8aed65 (63): Action Failed get_task: Task b697cd2d-64c7-41d3-498a-e78584ba38b4 result: Stopping Monitored Services: Stopping services '[consul_agent]' errored (00:11:04) 
Failed updating instance diego_cell > diego_cell/931389ef-3591-46ea-aad9-0a3c2139a6ea (23): Action Failed get_task: Task 533905df-5a62-48b4-586a-da092595ee34 result: Updating certificates with retries (00:13:30)

Environment


Cause

 "Updating certificates with retries" errors could result from poor disk performance. The churn of restarting so many instances simultaneously is stressing out the IaaS too much.

Resolution

The max-in-flight value in Operations (Ops) Manager needs to be reduced in order to prevent too many Diego cells from being updated simultaneously.

The Exact API to call to change max-in-flight can be found in Ops Manager API here:

https://[FQDN Ops Manager]/docs#configuring-the-max_in_flight-settings-for-a-product-39-s-jobs

Follow the steps below:

Authenticate & Get Token: https://[FQDN Ops Manager]/docs#authentication

1. Target your Ops Manager IP:

uaac target https://[FQDN Ops Manager]/uaa

2. Log in to your Ops Manager with the Client name “opsman” as Ops Manager admin:

uaac token owner get

Client name: opsman
Client secret:
User name: YOUR_USERNAME_HERE
Password: YOUR_PASSWORD_HERE

3. Retrieve your Ops Manager access token via "access_token:" section:

uaac context

Result
[5]*[https://[FQDN Ops Manager]/uaa]
  skip_ssl_validation: true

  [0]*[admin]
      user_id: 61258ded-24df-4724-b3a2-c88768437864
      client_id: opsman
      access_token: eyJhbGciOiJRUzI1NiIsImprdSI6I....

4. Get TAS products guid via "guid" section, starting from "cf-...":

curl "https://[FQDN Ops Manager]/api/v0/deployed/products" -k -X GET -H "Authorization: Bearer [Ops Manager access token retrieved at Step 3.]"

Result 
{
"installation_name": "cf-6595dd22a5007e3f6f93",
"guid": "cf-6595dd22a5007e3f6f93",
"type": "cf",
"product_version": "1.10.8-build.7"
}

5. Get Max in Flight value and retrieve Diego Cell guid, starting from "diego_cell-...":

curl "https://[FQDN Ops Manager]/api/v0/staged/products/[PAS products guid retrieved at Step 4.]/max_in_flight" -k -X GET -H "Authorization: Bearer [Ops Manager access token retrieved at Step 3.]"

Result
{
 "max_in_flight": {
...
   "diego_cell-81b4916ae28d873c1988": 10,    …}
}

6. Set new Max in Flight value for Diego Cell - in the following example, the new value is 4.

curl "https://[FQDN Ops Manager]/api/v0/staged/products/[PAS products guid retrieved at Step 4.]/max_in_flight" -k -X PUT -H "Authorization: Bearer [Ops Manager access token retrieved at Step 3.]" -H "Content-Type: application/json" -d '{"max_in_flight": {"[Diego Cell guid retrieved at Step 5.]": 4 } }'

7. Confirm whether Max in Flight value is set as expected by repeating the command at Step 5.