NSX Network Detection and Response - Number of messages in db.inserted

Products

VMware vDefend Network Detection and Response

Issue/Introduction

Symptoms:

lastline_test_appliance reported the below ERROR

> HARDWARE: OK
> NETWORK: OK
> SOFTWARE:
>  FAILURE: Number of messages in db.inserted_pcaps.xxxx: 51665 exceeds threshold: 10000
> Max total number of messages: 51665 exceeds threshold: 100000
Exiting with error-code 3

The Manager nodes RabbitMQ backlog was piling up when checked with the command "rabbitmqctl -p llq_v1 list_queues | grep -v -w 0$"

Listing queues
db.inserted_pcaps.xxxx 51665

Restarting docker service did not help with resolving the issue
db.inserted_pcaps kept increasing

Environment

NSX-NDR (Lastline)

Resolution

To workaround the issue, we can increase the number of workers temporarily.

Create override.yaml file with an increased number of workers by running the below command.

echo -e 'llupload::workers::inserted_pcaps::instances: 3' >> /etc/appliance-config/override.yaml

While this will help reducing the db.inserted_pcaps queue, this might increase the retry.X queues.

Check this with the command "rabbitmqctl -p llq_v1 list_queues | grep -v -w 0$"

Example output:

Listing queues

retry.1 41

retry.3 351

retry.4 740

retry.0 17

retry.2 110

retry.5 2267

If this is the situation, the problem might be coming from the PCAP analysis pipeline.

Check if the service is started: service uwsgi::app::pcapapi2-uwsgi status
Checked pcapapi2 logs : tail -n50 /var/log/pcapapi2/pcapapi_server.log

Log output as the following would indicate a problem with Suricata:

2023-08-22 08:15:29,065 - suricata_uds_client.manager - WARNING - SURICATA socket broken pipe, retry 19

2023-08-22 08:15:30,066 - pcapapi.plugin.suricata - ERROR - SURICATA: communication error (failure to connect to Suricata after 20 retries)

2023-08-22 08:15:30,066 - suricata_uds_client.manager - WARNING - SURICATA socket broken pipe, retry 0

2023-08-22 08:15:31,068 - suricata_uds_client.manager - WARNING - SURICATA socket broken pipe, retry 1

2023-08-22 08:15:32,069 - suricata_uds_client.manager - WARNING - SURICATA socket broken pipe, retry 2

2023-08-22 08:15:33,071 - suricata_uds_client.manager - WARNING - SURICATA socket broken pipe, retry 3

2023-08-22 08:15:34,072 - suricata_uds_client.manager - WARNING - SURICATA socket broken pipe, retry 4

2023-08-22 08:15:35,075 - suricata_uds_client.manager - WARNING - SURICATA socket broken pipe, retry 5

2023-08-22 08:15:36,076 - suricata_uds_client.manager - WARNING - SURICATA socket broken pipe, retry 6

2023-08-22 08:15:37,078 - suricata_uds_client.manager - WARNING - SURICATA socket broken pipe, retry 7

Check the status of the service: systemctl status suricata-lastline-unix-socket@suricata_0.service
Check Syslog file for entry matching suricata: grep -i 'suricata' /var/log/syslog | tail -n20

Log output as the following would validate the problem with Suricata:

Aug 22 09:13:10 lastlinemanager systemd[1]: Started suricata-lastline-unix-socket@suricata_0.service.

Aug 22 09:13:11 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Main process exited, code=exited, status=1/FAILURE

Aug 22 09:13:11 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Failed with result 'exit-code'.

Aug 22 09:13:12 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Service hold-off time over, scheduling restart.

Aug 22 09:13:12 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Scheduled restart job, restart counter is at 242549.

Aug 22 09:13:12 lastlinemanager systemd[1]: Stopped suricata-lastline-unix-socket@suricata_0.service.

Aug 22 09:13:12 lastlinemanager systemd[1]: Started suricata-lastline-unix-socket@suricata_0.service.

Aug 22 09:13:12 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Main process exited, code=exited, status=1/FAILURE

Aug 22 09:13:12 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Failed with result 'exit-code'.

Aug 22 09:13:13 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Service hold-off time over, scheduling restart.

Aug 22 09:13:13 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Scheduled restart job, restart counter is at 242550.

Aug 22 09:13:13 lastlinemanager systemd[1]: Stopped suricata-lastline-unix-socket@suricata_0.service.

Aug 22 09:13:13 lastlinemanager systemd[1]: Started suricata-lastline-unix-socket@suricata_0.service.

Aug 22 09:13:13 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Main process exited, code=exited, status=1/FAILURE

Aug 22 09:13:13 lastlinemanager systemd[1]: suricata-lastline-unix-socket@suricata_0.service: Failed with result 'exit-code'.

Manually running the suricata command line with added verbosity should indicate the problem:

# /usr/bin/suricata -vvvv -c /etc/suricata/suricata-lastline-unix-socket.yaml --unix-socket=/run/suricata_0/socket.sock --pidfile /run/suricata_0/suricata.pid --user=suricata &

22/8/2023 -- 14:09:05 - <Notice> - This is Suricata version 5.0.4 RELEASE running in SYSTEM mode

22/8/2023 -- 14:09:05 - <Info> - CPUs/cores online: 12

22/8/2023 -- 14:09:05 - <Config> - luajit states preallocated: 512

22/8/2023 -- 14:09:05 - <Info> - SSSE3 support not detected, disabling Hyperscan for MPM

22/8/2023 -- 14:09:05 - <Error> - [ERRCODE: SC_ERR_INVALID_YAML_CONF_ENTRY(139)] - Invalid spm algo supplied in the yaml conf file: "hs"

The error here indicates that the CPU is not equipped with the required SSSE3 CPU Flag that Suricata needs.