Shutting down Kubernetes pods causes the edge datapath cpu usage to spike to 100p.
search cancel

Shutting down Kubernetes pods causes the edge datapath cpu usage to spike to 100p.

book

Article ID: 410669

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Mass shutting down of Kubernetes pods causes the edge datapath cpu usage to spike to 100p, triggering the Edge Datapath CPU Usage alarm.
  • Other alarms such as SNAT Port Usage High, Edge NIC Out of Buffer, etc. too may be reported.
  • Unless the firewall (NAT) sessions are dropped (one way would be to disable/re-enable the relevant NAT rules), reconnecting to the supervisor node or, powering on the pods may fail.

Environment

VMware NSX

Cause

In an NSX-T environment, a Tier-0 Gateway may be configured with multiple uplink interfaces. Workloads, such as Kubernetes clusters or other applications, can generate traffic that enters the Tier-0 through one of these uplinks.

When traffic enters through a specific uplink, it may be subject to DNAT, translating the destination IP address to an address that is reachable through a different uplink interface. During the forwarding of this traffic via the second uplink, SNAT policies may also be applied, translating the source IP address.

As a result, the same flow traverses from one uplink interface to another within the Tier-0 Gateway and undergoes both DNAT and SNAT translation, causing a loop in routing.

Resolution

With respect to the datapath in question, examine the firewall connection tables, SNAT and DNAT rules, any static routes that may be configured for the relevant network, etc. to see if there is any config on NSX which may be causing a routing loop, resulting in exhaustion of SNAT ports and edge datapath cpu usage spiking to 100p.

For more assistance, please raise a Broadcom Support Case as detailed in Creating and managing Broadcom support cases.