BOSH Director process blobstore_nginx does not start after outage

search cancel

BOSH Director process blobstore_nginx does not start after outage

book

Article ID: 293483

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

Symptoms:
On BOSH Director VM. Process blobstore_nginx reports `not monitored`.

After network or infrastructure outage, `bosh director` does not came back successfully, even after a restart. `nginx` does not start anymore - error message in `/var/vcap/sys/log/blobstore/error.log`:

2018/08/13 09:18:20 [emerg] 1#0: bind() to unix:/var/vcap/data/blobstore/backend.sock failed (98: Address already in use)
2018/08/13 09:18:20 [emerg] 1#0: bind() to unix:/var/vcap/data/blobstore/backend.sock failed (98: Address already in use)
2018/08/13 09:18:20 [emerg] 1#0: bind() to unix:/var/vcap/data/blobstore/backend.sock failed (98: Address already in use)
2018/08/13 09:18:20 [emerg] 1#0: still could not bind()

Environment

Cause

This issue is caused by a BOSH defect in which Blobstore nginx can't restart after ungraceful shutdown. This problem occurred after adding TLS blobstore feature and resulted in blobstore nginx process not being able to recover if it does not exit cleanly. Basically Nginx is not able to restart if the /var/vcap/data/blobstore/backend.sock was not cleaned up on prior exit.

Resolution

A solution is simply removing backend.sock file and restarting blobstore_nginx:

sudo rm /var/vcap/data/blobstore/backend.sock
monit restart blobstore_nginx

If this does not work then there may be further corruptions in BOSH director that require fixing. It may be necessary to recreate Director VM. This can be done by making a slight change to size of persistent disk of BOSH director and Applying Changes.

This issue is fixed in 267.7 of BOSH director. Please reference release notes for version of OpsManager with fix: https://docs.pivotal.io/pivotalcf/2-3/pcf-release-notes/opsmanager-rn.html

Feedback

thumb_up Yes

thumb_down No