How to enable the cron job for automated CbModule purging.
Environment
EDR Server: All Versions
Resolution
NOTE: This cron job labeling is a misnomer. modulestore_purge purges cbmodules (binary metadata found in the binary search page). binary_purge is a separate process that purges physical .zip files in /var/cb/data/modulestore (Physical file downloads)
WARNING: This will irrecoverably remove data from the EDR server. After removing this data, you will receive a 404 page when attempting to view binary details pages older than MaxEventStoreDays and this data will no longer appear in the Binary Search.
Connect to the backend of the Primary server
Determine how many days of binary metadata you would like to keep. It should be set higher than your retention days.
Default retention is between 20-30 days.
You can determine your current retention by running this command on an eventful server (minion for cluster, primary for standalone). This query assumes you are not storing cold cores in the same directory.
c=`find /var/cb/data/solr/cbevents/cbevents_* -type f -printf '%T+ %p\n' | sort | head -n 1 | awk '{print$2}'`; echo $(( $(stat -c %Y $c 2>/dev/null | awk -v d="$(date +%s)" 'BEGIN {m=d} $0 < m {m = $0} END {print d - m}') /86400 ))
A cron job needs to be enabled manually in /etc/cb/cron/cb.cron.template from:
# Run module store purge to remove module docs that are not referenced in any process docs -- once a day at 2am
# Note: Enabling this task can cause loss of data, so please do not enable without consulting CarbonBlack support team first
# 0 2 * * * cb /usr/bin/python -m cb.maintenance.job_runner --master -s modulestore_purge -T 60 -o >> /var/log/cb/job-runner/startup.out 2>&1
To the new value where -T <days> is how many days after the last event was seen:
Perform the same update to /etc/cron.d/cb to have Cron pick this up immediately without restarting services.
Additional Information
The counter resets if new event is seen before purging. Purging will happen the -T days after the last event is seen with that hash.
Sensor that have not seen the binary before, or have been reinstalled will be able to resubmit the binary metadata to be seen in the console and be alerted on. Sensors keep an internal database of binaries already sent
Keeping binary metadata on larger instances can cause search performance issues as more are stored. Keeping in mind, that queries like filemods or signature status require a join to the binary core.
Most binary metadata will never be seen again. As an example, an OS update can create 10's of thousands of new binaries that will never be seen again.