Network Outage When Enabling LAG on Distributed Switch
search cancel

Network Outage When Enabling LAG on Distributed Switch

book

Article ID: 418491

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • When configuring Link Aggregation Groups (LAG) on a vSphere Distributed Switch, enabling the LAG in the teaming policy affects all hosts connected to the distributed switch simultaneously, not individual hosts. This results in network outages when the physical switch LACP/port-channel configuration has only been completed for a subset of hosts.
  • Possible symptoms observed when LAG is partially configured:
    • Complete loss of network connectivity for VMs on affected hosts
    • vCenter management interface becomes unreachable
    • Network outage on all DVPortgroups using the LAG
    • ESXi host management may remain accessible if on a separate standard switch

  • After recovering vCenter connectivity, attempting to delete the misconfigured LAG results in error:
    The resource '<lag-name>' is in use. Uplink or Link Aggregation group name <lag-name> is in use by the teaming policy defined at DVPortgroup dvpg-######

Environment

VMware vCenter
VMware ESXi|
vSphere Distributed Switch

Cause

LAG configuration on a vSphere Distributed Switch is a global setting that applies to all hosts connected to the distributed switch. When the LAG is set as Active in the teaming and failover policy of distributed port groups, all hosts attempt to use the LAG configuration immediately. If the corresponding LACP/port-channel configuration has not been completed on the physical switch for all host ports, hosts without proper physical switch configuration lose network connectivity.

The distributed switch propagates the LAG configuration to all member hosts regardless of their individual physical switch readiness, causing a configuration mismatch between the virtual and physical network layers.

Resolution

To properly implement LAG configuration on a vSphere Distributed Switch:

Correct Implementation Process:

  1. Schedule a maintenance window for all hosts on the distributed switch
  2. Configure physical switches first:
    • Configure LACP/port-channel on ALL physical switch ports for ALL hosts
    • Verify VLAN and MTU settings match current configuration
    • Ensure all host-connected switch ports are properly configured before proceeding
  3. Configure vCenter after physical switch is ready:
    • Create the LAG on the Distributed Switch
    • Set LAG to Standby initially in teaming and failover policy
    • Migrate physical NICs to LAG ports
    • Move LAG from Standby to Active in teaming policy
    • Move standalone uplinks from Active to Unused

Recovery from Partial Configuration:

If LAG was enabled before all physical switches were configured:

  1. Access affected ESXi hosts directly (not through vCenter)
  2. Restore vCenter connectivity:
  3. Modify Distributed Switch configuration in vCenter:
    • Navigate to affected DVPortgroups
    • Go to Configure > Policies > Teaming and Failover
    • Move LAG from Active to Unused
    • Move standalone uplinks from Unused to Active
    • Note: Direct LAG deletion will fail with "resource in use" error while active in teaming policy
  4. Verify connectivity is restored for all hosts and VMs
  5. Complete physical switch configuration for all hosts before re-enabling LAG

Important considerations:

  • LAG configuration cannot be applied to individual hosts when using a Distributed Switch
  • All hosts must have matching physical switch configuration before enabling LAG
  • A maintenance window is required for the initial LAG implementation
  • Physical switch configuration must be completed for ALL hosts before enabling LAG in vCenter