(canary)","index":1,"state":"failed","progress":100,"data":{"error":"Action Failed get_task: Task be12941c-6f3a-4768-6560-0b99ccb089da result: 1 of 2 post-start scripts failed. Failed Jobs: harbor. Successful Jobs: bosh-dns."}} {"time":1623166682,"error":{"code":450001,"message":"Action Failed get_task: Task be12941c-6f3a-4768-6560-0b99ccb089da result: 1 of 2 post-start scripts failed. Failed Jobs: harbor. Successful Jobs: bosh-dns."}} ', "result_output" = '', "context_id" = '' WHERE ("id" = 284743) D, [2021-06-08T15:38:02.100153 #11106] [task:284743] DEBUG -- DirectorJobRunner: (0.000698s) (conn: 47098782504720) COMMIT I, [2021-06-08T15:38:02.100334 #11106] [] INFO -- DirectorJobRunner: Task took 2 minutes 6.566789100999998 seconds to process. Task 284743 error Capturing task '284743' output: Expected task '284743' to succeed but state is 'error' v2.2.1 post-start.stdout log shoes below msg and all jobs are up and running. /run/postgresql:5432 - accepting connections Still apply changes as part of the harbor tile upgrade treats post-start as failure and terminates the apply changes. L Error: Action Failed get_task: Task be12941c-6f3a-4768-6560-0b99ccb089da result: 1 of 2 post-start scripts failed. Failed Jobs: harbor. Successful Jobs: bosh-dns.
result: 1 of 2 post-start scripts failed. Failed Jobs: harbor.
In this event you hit this error, the best resources to start initially collecting are:
In observing some of the logs within the time frame of the failure, there will be a lookup issue within the Antivirus pre-start script:
antivirus/pre-start.stderr.log:2021/06/08 15:37:25 Get "https://antivirus-mirror.service.internal:6501/main.cvd": dial tcp: lookup antivirus-mirror.service.internal on 10.5.0.5:53: no such host
Note: The DNS address being looked up is 10.5.0.5
is an external DNS being used for the internal BOSH URL antivirus-mirror.service.internal. This is a good indication that BOSH DNS is not working as expected.
Looking at BOSH DNS logs at the time of failure, notice the status of bosh-dns is unhealthy.
bosh_dns_health.stdout.log-[Monitor] 2021-06-08T15:37:19.666425695Z WARN - Agent reports unhealthy: {State:stopped} bosh_dns_health.stdout.log-[Monitor] 2021-06-08T15:37:19.675714774Z INFO - Initial status: failing bosh_dns_health.stdout.log:[Monitor] 2021-06-08T15:37:24.675928752Z WARN - Agent reports unhealthy: {State:stopped} bosh_dns_health.stdout.log:[Monitor] 2021-06-08T15:37:29.676123599Z WARN - Agent reports unhealthy: {State:stopped}
In the BOSH DNS logs there is more detail on why BOSH DNS is failing.
For example, you see errors for certificate issues which require rotation or upstream recursion failures for that address.
[main] 2021-03-08T16:32:46.625235000Z ERROR - Unable to configure health checker failed to load keypair: certificate has expired: validity ended at 2021-02-13 19:37:15 UTC but current time is 2021-03-08 16:32:46 UTC ----------------------------------------------------------------------------------------- [ForwardHandler] 2021-06-07T21:39:04.216935000Z ERROR - error recursing to "10.21.0.12:53": received SERVFAIL for portal. from upstream (recursor: 10.21.0.12:53)
Validate the external DNS is or was working correctly during the issue.
Run the following command from the Harbor VM. This command may help provide some extra insight.
dig +trace DNSAddress
Performing a monit restart or restarting the VM could help address any instability from DNS issues or upstream issues.
If there is a certificate issue, identify the managed certificates that are expired and follow the documentation, Advanced Certificate Rotation with CredHub Maestro, to rotate these certificates.