EDR: SOLR writing to drives is suddenly slower causing aggregate sensor backlogs to grow
search cancel

EDR: SOLR writing to drives is suddenly slower causing aggregate sensor backlogs to grow

book

Article ID: 291029

calendar_today

Updated On:

Products

Carbon Black EDR (formerly Cb Response)

Issue/Introduction

  • Alliance dashboard disk operations per second suddenly slow from 1500 per second to about 10-300 per second on server nodes.
  • /var/log/cb/nginx/access.logs filled with "503" errors.
  • The following command comes back with incorrect disk type, true = spinning disk, false = SSD. This should be performed on primary and all minions
    curl -s 'http://localhost:8080/solr/admin/metrics?group=solr.node' | grep spins 
    
    "CONTAINER.fs.coreRoot.spins":true, 
    "CONTAINER.fs.spins":true,


 

Environment

  • EDR Server: All versions
  • EDR Sensor: All versions
  • Disk:  SSD type

Cause

Often happens with VM instances where the OS is returning spinning via the lsblk value when the host it lives on is SSD. 

Resolution

Force SOLR to treat the disks as SSD, thereby overriding the lsblk value, as so:
1. edit /etc/cb/cb.conf for each node in the cluster.
2 change this line from:
SolrDiskType=auto
to
SolrDiskType=ssd
3. restart the cluster.

Additional Information

  • Issue may occur after reboot of OS
  • A thread dump capture of solr indicates delay to commit data to disk takes more than 90 seconds, which causes the watchlist window to miss the event. The overloaded disk i/o will  stall the solr commit rates.
  • The max threading and max merge is different per type
    • HDD - Solr will create maxThreadCount=1 and MaxMergeCount=6
    • SSD - Solr will create maxThreadCount=4 and MaxMergeCount=11
  • Note: The OER recommends SSD's to handle the high amount of data ingestion that comes in.