Troubleshoot packet drops by an edge node with NAT enable.
search cancel

Troubleshoot packet drops by an edge node with NAT enable.

book

Article ID: 318718

calendar_today

Updated On: 02-06-2025

Products

VMware NSX

Issue/Introduction


  • SSL sessions over edge nodes are disconnected.
  • There is no message associated to this problem. To identify the problem, examine the flow cache dump.
  • Poisoned entries occur when the Flow cache contains incorrect information about network flows and their associated NAT actions. This means that for a specific 5-tuple representing a network flow that should have a particular NAT action, the Flow cache entry might not have the necessary firewall actions (service_output(), stateless_fw(), stateless_dnat(), or stateless_snat()). If the entry should have undergone NAT, but the necessary NAT action is missing, the packets may not be translated correctly, leading to unintended network behavior such as communication issues.
  • For live troubleshooting, run CLI commands set debug and get dataplane flow-cache file.
  • To analyze flow cache dump. decompress the files using command gunzip flow-cache-dump*
  • Then, use the following command to look at the dump in an edge:
    /opt/vmware/nsx-edge/bin/edge-cachedump flow-cache-dump-0 | grep 00000000
  • In the output lines, If the given 5-tuple should have some NAT action but there are flowcache entries for the relevant 5-tuple without a firewall action, such as service_output(), stateless_fw(), stateless_dnat(), or stateless_snat(), in the entry, then that entry on matching would cause the issue.
Example of a poisoned entry:
sig=0xaaea00fe, age=5.431s, bytes=0, packets=0, effective_mtu=1500, port_id=1, tep_idx=255 num_teps=4, dl_src=00:00:00:00:00:00, dl_dst=00:00:00:00:00:00, dl_type=0x800, bundle_port=4, vrf_id=9, in_ifuid=375, nw_src=###.##.##.##/32, nw_dst=##.##.##.##/32, nw_proto=6, nw_ttl=62, nw_flags=0x40, bundle_version=0, tp_src=41726/0xffff, tp_dst=443/0xffff, match_type=0x3, actions=service_output(metadata=0xd01000800000000,0x385aadead0009,0x26cddb5302000004), need_frag(lrp_uuid=########-####-####-####-############, mtu=1500, vrf_id=9), IPv4 routing(dl_src=##:##:##:##:##:##, dl_dst=##:##:##:##:##:##, nw_ttl=62), output(port_id=1, vlan_id=123, mtu=9000, tx_cap=0x802d)
  • The same behavior is applicable to each tier (T0 and T1 Logical routers), depending on the location where the NAT needs to take place.


Environment

VMware NSX
VMware NSX 4.0.x.x
VMware NSX-T Data Center 4.x

Cause

  • If an edge node receives an RST on a NATed session, it creates a poison entry in flow cache. This poison entry causes packet drops in subsequent packets.

Resolution

  • Upgrade to one of the fixed software versions
  • Versions where this is a known issue:

All software up to the following versions
For 3.1, up to 3.1.3.7.3
For 3.2, up to 3.2.1.1.1
For 4.0, up to 4.0.0.1

  • Version where this is fixed:

3.1.3.7.4 
3.1.3.8 
3.2.1.1.2 
3.2.2 
4.0.1.1 

Additional Information

Impact/Risks:
  • Customer experiences applications failures.