EDR: All sensors offline in the UI due to PostgreSQL
search cancel

EDR: All sensors offline in the UI due to PostgreSQL

book

Article ID: 288614

calendar_today

Updated On:

Products

Carbon Black EDR (formerly Cb Response)

Issue/Introduction

  • Error is seen in the \var\log\cb\sensorservices\startup.log logs on the server: 
    "InternalError: (psycopg2.InternalError) unexpected chunk number 0 (expected 1) for toast value 1629837 in pg_toast_2619 
    [SQL: 'SELECT sensor_registrations.id AS sensor_registrations_id, sensor_registrations.cookie AS sensor_registrations_cookie, sensor_registrations.registration_time"
  • Errors seen in the Solr logs: 
    [ERROR] - from com.carbonblack.cbfs.solr.CbProcessUpdateRequestProcessorBase in qtp1007251739-79442 
    Insert document exception 
    org.apache.solr.common.SolrException: Exception writing document id 000037dc-0000-1b80-01d3-e4f2dcf331f1-016333b8d8e0 to the index; possible analysis error.

Environment

  • EDR Server: All supported versions

Cause

This issue occurs due to database corruption.

Resolution

A full offline vacuum of the database tables is required:
  • Stop the EDR services.
Standalone:
# sudo /usr/share/cb/cbservice cb-enterprise stop

Cluster:
# sudo /usr/share/cb/cbcluster stop
  • ​​Start Postgres: 
# sudo /usr/share/cb/cbservice cb-pgsql start
  • Generate a backup of the tables in a location with enough storage:
    pg_dump -C -Fp -f psqldump_full.sql cb -p 5002
  • Generate a backup of the user roles:
    pg_dumpall -p 5002 --roles-only -f psqlroles.sql 
  • Regenerate indexes for all tables and redirect output to a file:
    # sudo psql -p 5002 -d cb -c "REINDEX DATABASE cb;" 2> /tmp/cbbackup/reindex_output.txt 
  • Vacuum old/incorrect data and redirect output to a file:
    # sudo psql -p 5002 -d cb -c "VACUUM FULL VERBOSE ANALYZE;" 2> /tmp/cbbackup/vacuum_output.txt
  • Stop Postgres:
    # sudo /usr/share/cb/cbservice cb-pgsql stop
  • Start the Response services:
Standalone:
# sudo /usr/share/cb/cbservice cb-enterprise start

Cluster:
# sudo /usr/share/cb/cbcluster start

 

Additional Information

  • PSQL backup may be removed after the services are successfully started.
  • PSQL backup file can be large, and should be written to a location that has ample storage available.
  • Database corruption can occur due to the following reasons:
    1. OS Kernel Panics.
    2. Frequently unexpected shutdown of server.
    3. Unexpected reboot.