Once applying the tkgi compute-profile tasks fails with a bosh timeout, the subsequent deploys may fail with database error.
bosh task shows an error similar to:
Task 6840762 | 07:48:56 | Error: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "link_providers_constraint"
DETAIL: Key (deployment_id, instance_group, name, type)=(32, worker-large-dev, docker, job) already exists.
If a failure occurs when you apply the compute profile ( can be because of renaming the instances or unrelated timeouts such as once agent, once a unmount disk) the subsequent deploys would also fails till we fix the the first failure.
This is a known issue and currently there is no resolution.
To work around this issue:
find the deployment ID in the logs (is part of the error message):
{"time":1632124136,"error":{"code":100,"message":"PG::UniqueViolation: ERROR: duplicate key value violates unique constraint \"link_providers_constraint\"\nDETAIL: Key (deployment_id, instance_group, name, type)=(32, worker-large-dev, docker, job) already exists.\n"}}', "result_output" = '', "context_id" = '225d2ac9-9415-4de4-ba7e-99696ce23484' WHERE ("id" = 6840762)
From above Key (deployment_id, instance_group, name, type)=(32, worker-large-dev, docker, job) so deployment_id is 32.
Next ssh into bosh director and become root sudo su –
Start the director console: /var/vcap/jobs/director/bin/console
Execute the following snippet in the console where 32 should be replace with the actual deployment id found in step 1:
BD::Models::Links::LinkProvider.where(deployment_id: 32).map(&:delete)
BD::Models::Links::LinkConsumer.where(deployment_id: 32).map(&:delete)
exit
Get a copy of the manifest used from the last failed deploy.
bosh task {task_id_of_the_failed_deploy} --debug | awk '/Manifest:/{f=1;next} /D,|I,/{f=0} f' | bosh int - > manifest.yml
Perform a normal bosh deploy with the retrieved manifest.
bosh deploy -d service-instance_9d44b100-2f34-4ee5-9d96-a9510fec7a55 manifest.yml