Firewall API slow processing causes TKGi pods to fail NotReady or ContainerCreating
search cancel

Firewall API slow processing causes TKGi pods to fail NotReady or ContainerCreating

book

Article ID: 370053

calendar_today

Updated On:

Products

VMware NSX VMware vDefend Firewall

Issue/Introduction

  • NSX 3.2.x and 4.1.0.x
  • Environments using Firewall rules at scale.
  • TKGi pods are stuck in NotReady or ContainerCreating state.
  • TKGi has liveness probe configured.
  • The API log on the NSX Manager shows increasing times to process firewall related APIs, for example,

/var/log/proton/localhost_access_log.txt
2024-06-05T21:01:32.176Z - "POST /nsxapi/api/v1/firewall/sections/<Section UUID>/rules?operation=insert_bottom HTTP/1.1" 200 1292 142 141
2024-06-05T21:01:32.436Z - "POST /nsxapi/api/v1/firewall/sections/<Section UUID>/rules?operation=insert_bottom HTTP/1.1" 200 1292 146 146
2024-06-05T21:01:32.656Z - "POST /nsxapi/api/v1/firewall/sections/<Section UUID>/rules?operation=insert_bottom HTTP/1.1" 200 1294 130 129
2024-06-05T21:01:32.911Z - "POST /nsxapi/api/v1/firewall/sections/<Section UUID>/rules?operation=insert_bottom HTTP/1.1" 200 1294 135 135
2024-06-05T21:01:33.166Z - "POST /nsxapi/api/v1/firewall/sections/<Section UUID>/rules?operation=insert_bottom HTTP/1.1" 200 1292 142 142
2024-06-05T21:05:35.882Z - "POST /nsxapi/api/v1/firewall/sections/<Section UUID>/rules?operation=insert_bottom HTTP/1.1" 200 1294 14095 14095
2024-06-05T21:07:16.331Z - "POST /nsxapi/api/v1/firewall/sections/<Section UUID>/rules?operation=insert_bottom HTTP/1.1" 200 1294 93716 93716
2024-06-05T21:10:55.137Z - "POST /nsxapi/api/v1/firewall/sections/<Section UUID>/rules?operation=insert_bottom HTTP/1.1" 200 1292 174402 174402
2024-06-05T21:32:23.824Z - "POST /nsxapi/api/v1/firewall/sections/<Section UUID>/rules?operation=insert_bottom HTTP/1.1" 400 211 116426 116425

The last value on each line being the time taken to commit the API response, in milliseconds.
Overtime the value is increasing and the last API above fails with a "400", Bad Request error.

Cause

The Firewall Management Plane (MP) API uses an inefficient algorithm to process configuration updates to the database. This can result in slow processing of the API calls over time. This issue does not impact the Policy API.

Resolution

This issue is resolved in NSX 3.2.3.2, 4.1.1 and above.

Disabling Pods' Liveness and Readiness probe is the only viable workaround.

Alternatively, upgrade to a fixed version of NSX.