EDR: Large Number of Sensor Offline with HTTP 499 Error
book
Article ID: 291176
calendar_today
Updated On:
Products
Carbon Black EDR (formerly Cb Response)
Issue/Introduction
- Large number of sensor offline
- /var/log/cb/nginx/access.log shows increasing number of HTTP 499 errors.
- /var/log/cb/sensorservices shows
TimeoutError: QueuePool limit of size 5 overflow 20 reached, connection timed out
Environment
- EDR Server: 6.5.3 and Higher
Cause
It is suspected that large number of sensor checkin per second overwhelms the server.
Resolution
- Stop the cluster
- Increase the DB pool overflow to 40 by editing /etc/cb/cb.conf on the master node and adding the following parameter:
DatabasePoolOverflow=40
- Change minimum checkin time to 60 by adding the following parameter to /etc/cb/cb.conf on the master and the minion:
MinSensorCheckinDelaySec=60
- Start the cluster, then wait approximately 2 hours for sensors to start checking in.
- Figure out how many sensors are checking in per second using the following command on both the master then the minion:
sudo /usr/share/cb/cbstats -m Sensor | grep checkins
- Figure out what enterprised is using for their checkin time by running the following command on both the master then the minion:
sudo /usr/share/cb/cbdatagrid get sensor_checkin_throttle 0
- Check the number of TCP connections by running the following command on both the master and minion node.
sudo ss -s
- If there are still issues, please collect a new cbdiag and share it with CB Support.
Additional Information
Restarting the EDR Server has been known to resolve this issue, specifically if there are no entries in /var/log/cb/sensorservices/sensorservices.log.
Feedback
thumb_up
Yes
thumb_down
No