When having Data Node disk space issues, we might find any of the following symptoms:
When a Data Node system exceeds the disk usage threshold (80%) Elasticsearch will mark indices as read-only, causing the queues in the manager to pile up.
On the Manager appliance:
Running the utility lastline_test_appliance will show the following Failure:
> FAILURE: Number of messages in ids_dhcp: 19970 exceeds threshold: 10000 > Number of messages in ids_krb: 17175 exceeds threshold: 10000 > Number of messages in ids_smb: 19149 exceeds threshold: 10000 > Number of messages in ids_tls: 18322 exceeds threshold: 10000 > Number of messages in ids_urls: 11903 exceeds threshold: 10000 > Number of messages in netflows: 18409 exceeds threshold: 10000 > Max total number of messages: 104928 exceeds threshold: 100000
Monitoring logs will show this Errors:
Running the command rabbitmqctl -p llq_v1 list_queues | grep -v -w 0$ will show a similar output:
Listing queues db.analysis_completed.3104 11 db.events.3104 2 ids_smb 20000 netflows 19999 db.breach_correlation_rule_runner.3104 7 detection_entity_network_event 19999 ids_tls 20000 ids_urls 20000 ids_krb 20000 ids_dhcp 20000
The Kibana NTA Record Counts in Dashboard - Home never update and an error when they click on individual NTA Record tiles:
On the Data Node appliance:
The monitoring logs will show a Warning like this:
In the files in the file /var/log/lastline/appliance_check.log or /var/log/elasticsearch/lldns/curator.log we will confirm the read only state of the indices:
Feb 28 05:58:54 lastline-datanode llanta-storage_llanta-storage-webrequest-dkr_1[9462]: AuthorizationException: AuthorizationException(403, u'cluster_block_exception', u'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')
Sometimes the indices will take too much disk space, we can query all the indices and their size to confirm it by running the command:
curl -s localhost:9200/_cat/indices?v
We can sort the output alphabetically to group together the indices types to have a better idea of what is using the most storage by adding sort to the command.
i.e:
curl -s localhost:9200/_cat/indices?v | sort
If adding more disks or Data Nodes is not possible (See the Related Information section bellow), then:
2. Free up disk space by configuring a shorter data retention policy than default (32 days) by adding an override to the appliance:
pdns::data_retention::delete_delay: 'N'Replace 'N' with a number smaller than the default 32 days for which we will keep data around; for example, 21 would set data retention to 21 days.
2024-02-29 23:42:37,613 INFO Preparing Action ID: 1, "delete_indices" 2024-02-29 23:42:37,615 INFO Trying Action ID: 1, "delete_indices": delete_indices indices matching ^dhcp-|^krb-|^netflow-|^pdns-|^rdp-|^smb-|^tls-|^webrequest- and older than 18 days (based on index name) 2024-02-29 23:42:37,772 INFO Deleting 24 selected indices: [u'netflow-20240209', u'rdp-20240209', u'tls-20240209', u'pdns-20240209', u'netflow-20240211', u'netflow-20240210', u'tls-20240211', u'tls-20240210', u'krb-20240209', u'krb-20240211', u'krb-20240210', u'dhcp-20240209', u'dhcp-20240210', u'dhcp-20240211', u'smb-20240210', u'smb-20240211', u'smb-20240209', u'pdns-20240210', u'pdns-20240211', u'webrequest-20240209', u'rdp-20240210', u'rdp-20240211', u'webrequest-20240210', u'webrequest-20240211'] 2024-02-29 23:42:37,772 INFO ---deleting index netflow-20240209 2024-02-29 23:42:37,772 INFO ---deleting index rdp-20240209 2024-02-29 23:42:37,772 INFO ---deleting index tls-20240209 2024-02-29 23:42:37,772 INFO ---deleting index pdns-20240209 2024-02-29 23:42:37,773 INFO ---deleting index krb-20240209 2024-02-29 23:42:37,773 INFO ---deleting index dhcp-20240209 2024-02-29 23:42:37,773 INFO ---deleting index smb-20240209 2024-02-29 23:42:37,773 INFO ---deleting index webrequest-20240209 2024-02-29 23:42:38,448 INFO Action ID: 1, "delete_indices" completed.Then confirm the disk usage:
4. Restart Kibana UI on Manager in a SSH session at the CLI prompt:
sudo service kibana restart
If further assistance is needed feel free to create a support request using our Customer Connect Portal:
How to file a Support Request in Customer Connect and via Cloud Services Portal (2006985)