Tanzu / NSX NCP Integration Removes Existing NAT Rules After Execution of a PowerCLI Script
search cancel

Tanzu / NSX NCP Integration Removes Existing NAT Rules After Execution of a PowerCLI Script

book

Article ID: 418435

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • When running a script all NAT rules within the NSX-T UI (Networking --> NAT --> Select Gateway from the dropdown box for the given Tier-0) are no longer present.
  • This results in a dataplane outage for all traffic which relies on said NAT rules to function.
  • The logs below are observed in /var/log/syslog on the NSX-T manager appliances.

Environment

VMware NSX

VMware Tanzu

Cause

A PowerCLI automation script intended to create a single NAT rule for a specific organization (ORG) within a Tanzu Kubernetes Grid (TKG) / NSX-T NCP–integrated environment instead resulted in:

  • The new NAT rule being created successfully

  • All existing NAT rules for other ORGs being deleted or overwritten

This caused widespread networking impact across multiple Kubernetes namespaces and ORG environments due to lost NAT flows and broken routing.

 

Resolution

The resolution is to re-create all deleted NAT rules manually to restore production traffic.

 

Best Practices:

The below recommendations should always be followed before running a script in a production environment:

1. Always Retrieve the Existing NAT Rule Set Before Adding a New Rule

2. Validate Script Logic for Declarative NSX-T Policy API

3. Avoid PUT for Single Rule Creation Where Possible

4. Test All Automation in a Non-Production Environment

5. Add Safety Checks to Automation

Additional Information

Below is an explanation of what potential causes may exist which can result in all NAT rules being removed.

 

1a. NAT Rules API Used by the Script Was Performing a Full Object Replace

  • In NSX-T Policy API, certain endpoints—especially NAT rule configuration endpoints—are not patch/append operations.
  • They are declarative “full state” objects, meaning:

A PUT call replaces the entire NAT rule list, not just the single rule being added.

  • If the script submitted only the new NAT rule in its payload, the NSX Manager interpreted this as:

"This is the complete and only desired state of NAT rules for this Tier-1/Tier-0."

As a result:

  • All existing NAT rules were wiped

  • Only the newly submitted rule remained

1.b Improper Use of PowerCLI / REST Calls

 
  • NSX-T Policy API operates declaratively: NAT rules are stored as a single JSON structure, not individual objects.

  • Many destructive automation issues come from misunderstanding the difference between PUT (replace) and PATCH (modify).

  • NSX-T Manager does not maintain implicit rule history; overwritten NAT rules are unrecoverable without API or Manager backups.

  • VMware strongly recommends:
    "GET → Modify → PUT" workflow for any Policy API automation.

  • When integrating Tanzu/NCP, NAT rules often map to Kubernetes namespaces, pods, load balancer IPs, and org contexts—overwrites can cause widespread cluster outages.

 

2. The Script Did Not Retrieve and Re-Submit Existing NAT Rules

 

Please see the following reference documentation:

Broadcom Developer Portal

Install NCP in a Tanzu Application Service Environment

Deploying Elastic Application Runtime with NSX-T Networking