SSP: Large number of events are sent from SSP to the syslog server

Products

VMware vDefend Firewall with Advanced Threat Prevention VMware vDefend Firewall

Issue/Introduction

Large number of syslog events are sent from SSP to the syslog server which leads to server slowness on syslog server

Environment

SSP 5.0

Cause

There is no log level based filtering while forwarding logs from SSP to the remote syslog server and log messages contain pod/container metadata. In addition, some components may log too frequently at INFO level, depending on scale.

Resolution

login to SSPI cli using root credentials and From SSPI, add following filters and update remote_syslog configuration in fluentd configmap:

Take backup of current configmap, save current configmap with below command:

k -n nsxi-platform get cm fluentd-aggregator-cm -o yaml > fluentd-aggregator-cm_original.yaml

Update the configmap using command:

k -n nsxi-platform edit cm fluentd-aggregator-cm

fluentd-aggregator-cm

apiVersion: v1
data:
  fluentd-inputs.conf: |
...
...
...
  fluentd-output.conf: |
    # Throw the healthcheck to the standard output
    <match fluentd.healthcheck>
      @type stdout
    </match>

  # -------------------- Add following 2 filters -------------------
    <filter **>
      @type record_transformer
      enable_ruby true
      <record>
        # Remove timestamp, stream type (stdout/stderr), full log (F) appended by Fluentd
        log ${record["log"] ? record["log"].gsub(/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|\+\d{2}:\d{2}) stdout [FP]\s*/, '') : nil}
      </record>
    </filter>

    <filter **>
      @type grep
      <regexp>
        key log
        pattern (?i)\b(INFO|WARN|ERROR|WARNING|FATAL)\b # INFO level logs
      </regexp>
    </filter>
...
...

# --------------- Search for remote_syslog or syslog_tls and add <format> tag in all the occurrences ---------------
    <store>
        @type remote_syslog  # -----------> Type will be syslog_tls in case of TLS syslog server
        host #.#.#.#
        port 514
        protocol tcp
        <format>
          @type single_value # For removing pod/container metadata from log message
          message_key log
        </format>
        hostname "${$.kubernetes.host}"
        <buffer $.kubernetes.host>
        </buffer>
      </store>

    </match>
  fluentd.conf: |
...
...

Restart fluentd pod using below command in SSPI cli

k -n nsxi-platform rollout restart statefulset fluentd

After fluentd pod is up and running, check logs on remote syslog server.

Note: Above example is from SSP 5.0 setup which has TCP remote syslog server configured. Same changes will work for UDP and TLS remote syslog server. Only the fluent plugin is different in case of TLS.

If Security Intelligence or NDR is deployed, and there are too many syslog events that contain the following pattern:

... RawflowCorrelationQuery - [CORRELATED/PARTIAL FLOW] ...

Increasing the log level in the flow ingestion pipeline may help mitigate the issue:

From SSPI cli , do the following:

1. k -n nsxi-platform edit cm rawflow-log4j-properties
2. Change the log level for 'logger.applogs.level' to warn

logger.applogs.name = com.vmware.nsx.pace
logger.applogs.level = info --------------------> Change from info to warn

3. k -n nsxi-platform delete pod spark-app-rawflow-driver
4. Wait for the spark-app-rawflow-driver and rawflowcorrelator-xxx pods to come up.

Note : This is fixed in SSP 5.1