Service Start Stalls Prior to cb-supervisord Causing Timeout
book
Article ID: 413621
calendar_today
Updated On:
Products
Carbon Black EDR (formerly Cb Response)
Issue/Introduction
cbcluster or cb-enterprise services time out on startup. Services stall at prior to starting up the cb-enterprise service.
Symptoms include:
Minions claim postgres is not reachable in /var/log/cb/datagrid/debug.log (clustered only).
Services are hanging for a few minutes prior to the cb-supervisord service.
Running "journalctl -fexu cb-enterprise" in a separate terminal session shows the service hanging at session closed for user cb.
runuser[316641]: pam_unix(runuser:session): session closed for user cb
A few minutes of time difference between startup script header and start of cb-supervisord.
cat /var/log/messages | grep -E 'Carbon Black EDR is a surveillance|Starting cb-supervisord'
Environment
Carbon Black EDR Server: All Versions
Cause
One of the pre-startup check scripts is taking too long to complete before services can attempt startup.
Resolution
Run the following command to find which service script is taking the longest amount of time.
for i in 'ServerStatusCheck' 'Cleanup' 'UpdateChkConfig' 'UpdateEtcHosts' 'CopyErlangCookie' 'InitModulestore' 'InitLoggers' 'VersionCheck' 'SolrCoreSetup' 'HandleSolrSSLKeyCert' 'HandleSolrSSLCert' 'GenerateRedisConfig' 'HandleRedisSSLCerts' 'PopulateNginxProps' 'ResetFilePermissions' 'GenerateCrontab' 'EnableRabbitMqPlugins' 'UpdateSELinux' 'GenerateRabbitMqClusterConfig''UpdateSysctl'; do echo "====Executing $i====" && time $(/usr/share/cb/virtualenv/bin/python -m cb.maintenance.cbstartup.main --single-action=$i); done
Most of these scripts are simple checks. There are two that have reported causing issues in some environments.
UpdateSELinux
This command should take a few seconds to complete (~2-6 seconds).
time semanage port -a -t rabbitmq_port_t -p tcp 5004
Two potential causes of this taking a long time.
Many custom selinux policies have been added outside the base OS and product. An admin should review all policies.
Disk speed is unable to handle the semanage call.
For VM's make sure resources are not shared.
Cloud environments such as AWS may be the tier chosen. For example an AWS T series has lower EBS bandwidth than an M series, where EBS bandwidth defines how fast the instance can access disk storage.
ResetFilePermissions
There are too many files in one or more of the paths. Use the following command to help narrow down common locations.
for i in '/var/cb/data/' '/var/cb/data/live-response/' '/var/cb/data/solr/' '/var/cb/data/modulestore/' '/var/log/cb/'; do echo -e "$(find $i -type f | wc -l) \t$i"; done
Attach the startup_stage.strace and cbdiag zip to the case.
Additional Information
UpdateSELinux: Uses semanage to update selinux policies for the EDR services. Semanage reads all policies, changes them in memory then writes out to temp files.
ResetFilePermissions: Checks for all files and updates the permissions to avoid permissions level startup issues or access to the database and log files.