Supervisor workloads experience network outage due to Baseline Security Policy deletion by NCP
search cancel

Supervisor workloads experience network outage due to Baseline Security Policy deletion by NCP

book

Article ID: 403436

calendar_today

Updated On:

Products

VMware vCenter Server 8.0

Issue/Introduction

In environments using vCenter 8.0 Update 3 and NSX, Supervisor workloads may experience a network outage when the NSX Container Plugin (NCP) deletes the baseline security policy. This can occur if NCP restarts due to unstable communication with NSX, particularly in clusters configured with a zero-trust security posture.

NCP Log Snippets (Indicators):
[ncp GreenThread-12 I] nsx_ujo.ncp.main Start NCP License Monitor
[ncp MainThread I] nsx_ujo.ncp.nsx.policy.firewall_service Deleted domain Group dg_domain-xxx...
[ncp GreenThread-54 E] create_security_policy_rule failed, cause: Resource could not be found on backend

 

Environment

  • vCenter Server 8.0 Update 3 and above
  • NSX environments with NCP integration

Cause

NCP includes a separate thread to validate the presence of a Distributed Firewall (DFW) license in NSX. If the connection to NSX is unstable or delayed, the NCP initialization thread may mistakenly assume the license is missing and proceed to delete the baseline security policy.

The baseline security policy contains critical rules, such as:

Environment Category:

  1. Ingress allow rule from Supervisor control plane VM to all segments used by the cluster.

Application Category:

  1. Allow rules for each Supervisor Namespace (pods, VMs, VKS clusters).
  2. DHCP allow rule for DHCP segment.
  3. Allow rules for Supervisor control plane VMs.
  4. Allow all egress rule.
  5. Deny all ingress rule.

If a user-defined zero-trust security policy is present (e.g., default deny ALL), deletion of these baseline allow rules will block all network traffic, leading to workload connectivity loss.

Resolution

Option A – Immediate Recovery:

  • Restart the NCP pod. This typically triggers re-creation of the baseline security policy.

 

Option B – Proactive Prevention:

  • Manually duplicate the baseline security policy and its rules to ensure they persist through NCP restarts.
    • Note: Manually created rules tied to Supervisor Namespaces must be updated if segments change during lifecycle operations (e.g., pod/VM/VKS deployments).