EDR: Large Number of Sensor Offline with HTTP 499 Error
search cancel

EDR: Large Number of Sensor Offline with HTTP 499 Error

book

Article ID: 291176

calendar_today

Updated On:

Products

Carbon Black EDR (formerly Cb Response)

Issue/Introduction

  • Large number of sensor offline
  • /var/log/cb/nginx/access.log shows increasing number of HTTP 499 errors.
  • /var/log/cb/sensorservices shows 
TimeoutError: QueuePool limit of size 5 overflow 20 reached, connection timed out

 

Environment

  • EDR Server: 6.5.3 and Higher

Cause

It is suspected that large number of sensor checkin per second overwhelms the server.

Resolution

  1. Stop the cluster 
  2. Increase the DB pool overflow to 40 by editing /etc/cb/cb.conf on the master node and adding the following parameter:
    DatabasePoolOverflow=40
  3. Change minimum checkin time to 60 by adding the following parameter to /etc/cb/cb.conf on the master and the minion:
    MinSensorCheckinDelaySec=60
  4. Start the cluster, then wait approximately 2 hours for sensors to start checking in.
  5. Figure out how many sensors are checking in per second using the following command on both the master then the minion:
    sudo /usr/share/cb/cbstats -m Sensor | grep checkins
  6. Figure out what enterprised is using for their checkin time by running the following command on both the master then the minion:
    sudo /usr/share/cb/cbdatagrid get sensor_checkin_throttle 0
  7. Check the number of TCP connections by running the following command on both the master and minion node.
    sudo ss -s
  8. If there are still issues, please collect a new cbdiag and share it with CB Support.

Additional Information

Restarting the EDR Server has been known to resolve this issue, specifically if there are no entries in /var/log/cb/sensorservices/sensorservices.log.