Incorrect golang package path in compile.env causes few errands to fail
search cancel

Incorrect golang package path in compile.env causes few errands to fail

book

Article ID: 293439

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

When a release is uploaded, the BOSH Director attempts to match those packages in the release to "similar" packages that have been previously uploaded. It does this currently by solely looking at the package fingerprint.


Recently, the golang-release was updated to template its various packages (e.g. golang-1-linux, golang-1.20-linux) and also support prefixes when vendoring its packages into another release. The packaging scripts then replace a string literal with the actual version/prefix in the final compiled package. Because of this templating, the fingerprint for each package is identical, but the compiled bits are different.


Since the Director matches only on fingerprint, this has led to situations where two releases vendor different packages from the golang-release, those packages have matching fingerprints, and thus when deploying VMs, one will receive its expected package bits while the other will receive the other release's version of the package.


For example, if the golang-1.20-linux package is compiled first, then a release that uses the golang-1-linux package will receive the compiled bits for the golang-1.20-linux package instead. This can be identified by /var/vcap/packages/golang-1-linux/bosh/compile.env referencing golang-1.20-linux instead of golang-1-linux:

  export GOROOT=$(readlink -nf /var/vcap/packages/golang-1.20-linux)

Due to this, errands may fail for different product tiles:

For example, Scheduler version 1.6.4 "test-scheduler" will fail with the following error message:

Instance   test-scheduler/xxxxxx
Exit Code  127
Stdout     /var/vcap/packages/smoke_test /var/vcap/bosh
           /var/vcap/bosh

Stderr     -

1 errand(s)

Errand 'test-scheduler' completed with error (exit code 127)

For example, Redis version 3.2.0, the "run-on-demand-broker-smoke-tests" will fail with the following error message:

           /var/vcap/packages/cf-redis-smoke-tests/src/github.com/pivotal-cf/cf-redis-smoke-tests /var/vcap/bosh  
           Failed to compile retry  
           Ginkgo ran 1 suite in xxxµs  
           Test Suite Failed 

Theoretically, this could also happen to any unrelated packages that happened to have the same fingerprint. It would also occur if two releases were vendoring the same package but with different prefixes.


Resolution

To solve this, "similar" packages are now discovered using the combination of the package name and the package fingerprint. We have released Ops Manager 2.10.57 and 3.0.9+LTS-T containing BOSH Director 277.3.1 that fixes this issue.

Note, once upgraded, bosh release will need to be reuploaded to the bosh director to resolve the problem. 

In the case of the Scheduler "test-scheduler" errand issue, uploading the pcf-scheduler 1.2.293 release from the scheduler tile and then redeploying the scheduler should fix it.
If you unzip the scheduler tile(p-scheduler-1.6.4-build.62.pivotal), you can then run:

bosh upload-release p-scheduler-1.6.4-build.62.pivotal/releases/release-pcf-scheduler-1.2.293-ubuntu-xenial-621.488-20230417-220706-695662074.tgz --fix

Then do an apply changes on the scheduler tile, and the errand should run at this point. 
For all other affected tiles, it’s safe to re-upload all the releases in the affected tile with the --fix flag to get it fixed.

In the case of the Redis "run-on-demand-broker-smoke-tests" errand issue, uploading the redis releases from the Redis tile and then running apply changes should resolve the issue.

bosh upload-release release-on-demand-service-broker-0.43.2.on-ubuntu-xenial-stemcell.621.655.tgz --fix

bosh upload-release release-redis-backups-8.2.30.on-ubuntu-xenial-stemcell.621.655.tgz --fix

bosh upload-release release-redis-metrics-8.1.25.on-ubuntu-xenial-stemcell.621.655.tgz --fix

bosh upload-release release-redis-service-6.0.44.on-ubuntu-xenial-stemcell.621.655.tgz

bosh upload-release release-redis-service-adapter-8.0.76.on-ubuntu-xenial-stemcell.621.655.tgz --fix