vCenter Crash and VM Network Outage due to Duplicate Distributed Virtual Port Groups (DVPG) after Certificate Replacement
search cancel

vCenter Crash and VM Network Outage due to Duplicate Distributed Virtual Port Groups (DVPG) after Certificate Replacement

book

Article ID: 426322

calendar_today

Updated On:

Products

VMware NSX VMware vCenter Server

Issue/Introduction

Following the replacement of expired certificates in vCenter, environments integrated with NSX-T may experience a critical loss of network connectivity across all workloads and frequent vCenter Server (VPXD) crashes. This issue is characterized by the sudden appearance of duplicate Distributed Virtual Port Groups (DVPGs) and the inadvertent deletion of Distributed Virtual Switch (DVS) objects during remediation attempts.

Symptoms

  • Duplicate DVPGs: Additional unexpected port groups appear in the vCenter inventory.

  • vCenter Service Crashes: The VPXD service fails to stay started due to database inconsistencies involving duplicate records.

  • Total Network Outage: VMs lose all network connectivity.

  • Firewall Policy Loss: Firewall configurations are wiped, reverting to a "Default Block All" state.

Environment

VMware NSX
VMware vCenter Server

Cause

The outage is the result of three interconnected failures triggered by a full synchronization between NSX and vCenter:

  1. NSX Sync Logic Error: After a certificate change or connection reset, NSX erroneously identifies segments in a "Failed" state. This triggers a full sync that automatically creates redundant port groups in vCenter.

  2. vCenter API Character Limit: NSX uses an API to create these port groups. If a port group name exceeds 80 characters, the API returns an error but saves corrupted/duplicate entries into the vCenter database. This corruption causes the VPXD service to crash during startup.

  3. DVS Object Deletion & Cleanup: During manual database remediation, if an active DVS record is deleted from the vCenter database, NSX initiates an automated cleanup. This removes all logical switches, ports, and firewall policies. Without these policies, the system defaults to a "Deny All" security posture, blocking all VM traffic.

Resolution

Permanent Fix

To fully resolve these issues and prevent recurrence, upgrade to the following versions:

  • vCenter Server: Upgrade to 8.0 Update 3 (8.0.3) or later (Addresses Issue #2).

  • NSX-T: Upgrade to 4.2.1.2 or later (Addresses Issue #1 and Issue #3 regarding vdsIdPortGroupStateMap cleanup).

Workaround / Recovery Steps

If immediate upgrade is not possible, follow these steps to restore connectivity:

  1. Restore vCenter: Revert vCenter to a snapshot taken prior to any manual database deletions.

  2. Database Repair:

    • Identify and rename all duplicate DVPGs in the vCenter database that exceed character limits.

    • Restart the vCenter Server services.

  3. Restore VM Connectivity:

    • To bypass the corrupted firewall state, "flap" the network configuration for affected VMs.

    • Swap the VM network assignment to a dummy DVPG, then immediately back to the intended DVPG. This re-initializes the port without a blocked firewall state.

  4. Preventive Cleanup:

    • Manually repair the underlying segment conditions in NSX to stop the "Failed" state triggers.

    • Clean up the vCenter database to ensure it reflects the original, non-duplicated state.