"result: 1 of 2 post-start scripts failed. Failed Jobs: harbor." error in post-start script when upgrading Harbor tile
search cancel

"result: 1 of 2 post-start scripts failed. Failed Jobs: harbor." error in post-start script when upgrading Harbor tile

book

Article ID: 298227

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

When you are upgrading the Harbor tile to v2.2.1 in Tanzu Application Service for VMs (TAS for VMs), it fails with this error in the post-start script:
(canary)","index":1,"state":"failed","progress":100,"data":{"error":"Action Failed get_task: Task be12941c-6f3a-4768-6560-0b99ccb089da result: 1 of 2 post-start scripts failed. Failed Jobs: harbor. Successful Jobs: bosh-dns."}} {"time":1623166682,"error":{"code":450001,"message":"Action Failed get_task: Task be12941c-6f3a-4768-6560-0b99ccb089da result: 1 of 2 post-start scripts failed. Failed Jobs: harbor. Successful Jobs: bosh-dns."}} ', "result_output" = '', "context_id" = '' WHERE ("id" = 284743) D, [2021-06-08T15:38:02.100153 #11106] [task:284743] DEBUG -- DirectorJobRunner: (0.000698s) (conn: 47098782504720) COMMIT I, [2021-06-08T15:38:02.100334 #11106] []  INFO -- DirectorJobRunner: Task took 2 minutes 6.566789100999998 seconds to process.   Task 284743 error   Capturing task '284743' output:   Expected task '284743' to succeed but state is 'error' v2.2.1 post-start.stdout log shoes below msg and all jobs are up and running. /run/postgresql:5432 - accepting connections Still apply changes as part of the harbor tile upgrade treats post-start as failure and terminates the apply changes. L Error: Action Failed get_task: Task be12941c-6f3a-4768-6560-0b99ccb089da result: 1 of 2 post-start scripts failed. Failed Jobs: harbor. Successful Jobs: bosh-dns.

In the error above, the most important information is that the harbor job failed during the post-start script execution. 
result: 1 of 2 post-start scripts failed. Failed Jobs: harbor.


Environment

Product Version: 2.10

Resolution

In this event you hit this error, the best resources to start initially collecting are:
  • The Operations Manager (Ops Manager) Support bundle - it can be found and downloaded here: https://<opsmanager-url>/support
  • The version of Harbor you are upgrading from and to.
  • Logs from the Harbor BOSH deployment. To do this, run this command: bosh -d <harbor-deployment-name> logs. This command generates a tarball in the present working directory.

In observing some of the logs within the time frame of the failure, there will be a lookup issue within the Antivirus pre-start script:

antivirus/pre-start.stderr.log:2021/06/08 15:37:25 Get "https://antivirus-mirror.service.internal:6501/main.cvd": dial tcp: lookup antivirus-mirror.service.internal on 10.5.0.5:53: no such host


Note: The DNS address being looked up is 10.5.0.5 is an external DNS being used for the internal BOSH URL antivirus-mirror.service.internal. This is a good indication that BOSH DNS is not working as expected.


Looking at BOSH DNS logs at the time of failure, notice the status of bosh-dns is unhealthy.

bosh_dns_health.stdout.log-[Monitor] 2021-06-08T15:37:19.666425695Z WARN - Agent reports unhealthy: {State:stopped}
bosh_dns_health.stdout.log-[Monitor] 2021-06-08T15:37:19.675714774Z INFO - Initial status: failing
bosh_dns_health.stdout.log:[Monitor] 2021-06-08T15:37:24.675928752Z WARN - Agent reports unhealthy: {State:stopped}
bosh_dns_health.stdout.log:[Monitor] 2021-06-08T15:37:29.676123599Z WARN - Agent reports unhealthy: {State:stopped}


In the BOSH DNS logs there is more detail on why BOSH DNS is failing.

For example, you see errors for certificate issues which require rotation or upstream recursion failures for that address.

[main] 2021-03-08T16:32:46.625235000Z ERROR - Unable to configure health checker failed to load keypair: certificate has expired: validity ended at 2021-02-13 19:37:15 UTC but current time is 2021-03-08 16:32:46 UTC

-----------------------------------------------------------------------------------------

[ForwardHandler] 2021-06-07T21:39:04.216935000Z ERROR - error recursing to "10.21.0.12:53": received SERVFAIL for portal. from upstream (recursor: 10.21.0.12:53)

 

Workaround

Validate the external DNS is or was working correctly during the issue. 

Run the following command from the Harbor VM. This command may help provide some extra insight.

dig +trace DNSAddress


Performing a monit restart or restarting the VM could help address any instability from DNS issues or upstream issues.

If there is a certificate issue, identify the managed certificates that are expired and follow the documentation, Advanced Certificate Rotation with CredHub Maestro, to rotate these certificates.