BOSH Director process blobstore_nginx does not start after outage
book
Article ID: 293483
calendar_today
Updated On:
Products
Operations Manager
Issue/Introduction
Symptoms: On BOSH Director VM. Process blobstore_nginx reports `not monitored`.
After network or infrastructure outage, `bosh director` does not came back successfully, even after a restart. `nginx` does not start anymore - error message in `/var/vcap/sys/log/blobstore/error.log`:
2018/08/13 09:18:20 [emerg] 1#0: bind() to unix:/var/vcap/data/blobstore/backend.sock failed (98: Address already in use) 2018/08/13 09:18:20 [emerg] 1#0: bind() to unix:/var/vcap/data/blobstore/backend.sock failed (98: Address already in use) 2018/08/13 09:18:20 [emerg] 1#0: bind() to unix:/var/vcap/data/blobstore/backend.sock failed (98: Address already in use) 2018/08/13 09:18:20 [emerg] 1#0: still could not bind()
Environment
Cause
This issue is caused by a BOSH defect in which Blobstore nginx can't restart after ungraceful shutdown. This problem occurred after adding TLS blobstore feature and resulted in blobstore nginx process not being able to recover if it does not exit cleanly. Basically Nginx is not able to restart if the /var/vcap/data/blobstore/backend.sock was not cleaned up on prior exit.
Resolution
A solution is simply removing backend.sock file and restarting blobstore_nginx:
If this does not work then there may be further corruptions in BOSH director that require fixing. It may be necessary to recreate Director VM. This can be done by making a slight change to size of persistent disk of BOSH director and Applying Changes.
This issue is fixed in 267.7 of BOSH director. Please reference release notes for version of OpsManager with fix: https://docs.pivotal.io/pivotalcf/2-3/pcf-release-notes/opsmanager-rn.html