Excessive Distributed Firewall (DFW) Logging Causes Host Resource or Stability Issues
search cancel

Excessive Distributed Firewall (DFW) Logging Causes Host Resource or Stability Issues

book

Article ID: 396587

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

To assist with DFW rule creation and troubleshooting, logging can be enabled at the rule level. When enabled, packet details are written to /var/run/log/dfwpktlogs.log on the ESXi host and can optionally also be sent to an external syslog server.

However, enabling logging on high-traffic rules can generate excessive logs, impacting internal ESXi host processes and leading to issues such as:

  • Loss of connectivity to the NSX Manager and/or vCenter.
  • VMs removed from NSX inventory and tags are lost
  • "NSX-T Logical Port Operational Status Down on Host" reported by NSX Manager resulting in VM connectivity loss
  • "Control Channel To Transport Node Down" alarm on random Host Transport Nodes
    • In this case, you'd see the following logs in /var/run/log/nsx-syslog.log on the ESXi host:
      2025-09-13T01:29:04.759Z nsx-proxy[166958067]: NSX 166958067 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="166958171" level="ERROR" errorCode="RPC60"] RpcTransport[0]::RemoteServiceManager Failed to setup forwarding for service 'vmware.nsx.nestdb.NestDb' for RpcConnection[1704 Connected to ssl://10.x.x.10:1235 0]

      And the following logs in /var/log/.vmsyslogd.err

    • 2025-09-07T20:04:35.824Z vmsyslog.msgQueue        : ERROR   ] 10.x.x.x:514 - lost 30642202 log messages
      2025-09-07T20:04:36.844Z vmsyslog.msgQueue        : ERROR   ] logging_server:514 - lost 25727094 log messages
      2025-09-07T23:47:03.518Z vmsyslog.main            : CRITICAL] Dropping messages due to log stress (qsize = 22500
  • Any of the following NSX services on the ESXi host may be affected:
    • nsx-opsagent
    • nsx-proxy
    • nsx-cfgagent
    • nsx-netopa
    • nsx-exporter
    • nsx-vdpi
    • nsx-nestdb

  • Keep-alive failures seen between the ESXi host processes in /var/run/log/nsx-syslog.log:
    2024-02-16T16:05:52.650Z nsx-opsagent[2107207]: NSX 2107207 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2107362" level="ERROR" errorCode="RPC31"] RpcConnection[465 Connected on tcp://127.0.0.1:4554 0] Keepalive failed - haven't received response in time (last request was sent 60 seconds ago, response received - 239 seconds ago)


  • In /var/run/log/nsx-syslog.log, you may observe socket closures or errors such as: error: 32-Broken pipe, error: 104-Connection reset by peer, or error: 2-End of file

    Service-to-Port Legend
    4096 - nsx-proxy
    4097 - nsx-cfgAgent
    4098 - nsx-opsAgent  
    4554 - nsx-opsAgent  
    9100 - nsx-opsAgent  
    2480 - nestDB-server


    Example of an nsx-proxy (port 4096) socket close due to "Broken pipe":
    2023-05-02T12:37:12.570Z nsx-proxy[5438587]: NSX 5438587 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="5438600" level="INFO"] StreamConnection[222 Closing on tcp://127.0.0.1:4096 sid:222] Closing (reason: network error)

    2023-05-02T12:37:12.571Z nsx-proxy[5438587]: NSX 5438587 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="5438600" level="INFO"] StreamConnection[222 Closed on tcp://127.0.0.1:4096 sid:-1] Closed (reason: network error, error: 32-Broken pipe)

Environment

VMware NSX - All Versions

Cause

The ESXi syslog daemon has a single queue that processes the syslog-related messages from various daemons. When vDefend Firewall sends excessive logs to this daemon, this queue can get congested and will be unable to process the messages. 

Resolution

For environments using VMware NSX 4.2.1 and prior

The user can disable the vDefend Firewall packet logging for the specific rules which generate more packet logging.

 

For environments using VMware NSX 4.2.2 and above 

By default, vDefend Firewall packet logging is capped to 10,000 packet logs / second per host.

The following log gets printed every 30 minutes in the ESXi host's vmkernel.log. When the packet logs are dropped for exceeding the limit, the "Dropped" counter would increase.

     VSIP DFW: Log request HWM during 1800 sec period = 32631 LPS. Rate limit = 10000 LPS. Logged = 17500. Dropped = 154686.


NOTE - If the symptoms described in the issue/intro section persist with rate limiting set to 10,000 logs per second, consider lowering the rate limit further. Please open a Service Request with Broadcom support to assist with changing this setting. 



The following vsipioctl command on the ESXi host can be used to display the rate-limiting information and stats (in 4.2.2 and above)

Sample stats output after generating a burst of flows:

[root:/] vsipioctl getloglimit -v 0
Log Rate Limiting = Enabled
    Limit            = 1500 logs/second
        Window msec  = 1000
        Cur tic      = 65156223085366
        Cycperwin    = 2099999425
    Cur window (1000 msec):
        Logged       = 0
        Dropped      = 0
        Start time   = 2024-10-22T02:33:31.867Z
        Start tic    = 65154569981781
        End tic      = 65156669981206
    Poll period (1800 sec):
        Logged       = 17523
        Dropped      = 277147
        LPS req HWM  = 32118
        Start tic    = 65091491111970
        End tic      = 68871490076618
        Time since poll = 30825 msec
    Total:
        Logged       = 1674245
        Dropped      = 277147
        LPS req HWM  = 34184
        Start tic    = 63020180711244
        End tic      = 0
    History:                                     Logged           Dropped
          0: 2024-10-22T02:33:16.869Z              1023                 0
          1: 2024-10-22T02:33:15.869Z              1500             30618
          2: 2024-10-22T02:33:14.869Z              1500             27205
          3: 2024-10-22T02:33:13.869Z              1500             23974
          4: 2024-10-22T02:33:12.869Z              1500             28272
          5: 2024-10-22T02:33:11.867Z              1500             30415
          6: 2024-10-22T02:33:10.868Z              1500             29029
          7: 2024-10-22T02:33:09.869Z              1500             28756
          8: 2024-10-22T02:33:08.869Z              1500             30133
          9: 2024-10-22T02:33:07.867Z              1500             25115
         10: 2024-10-22T02:33:06.867Z              1500             18926
         11: 2024-10-22T02:33:05.867Z              1500              4704

Usage: vsipioctl getloglimit <options>
    -v <secs>   : show historical logging data
    -z          : zero the poll counters
    -Z          : zero all the counters (poll, total, history)
    -h          : this help message

Additional Information

Impact of Enabling Logging on NSX DFW Rules

Techdoc for dfwpktlogs:  Distributed Firewall Packet Logs