VM connectivity loss on a vSphere Distributed Switch after vCenter rebuild
search cancel

VM connectivity loss on a vSphere Distributed Switch after vCenter rebuild

book

Article ID: 435164

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

After rebuilding vCenter Server and recreating a vSphere Distributed Switch (vDS), virtual machines (VMs) on specific VLANs lose network connectivity. The previous vDS configuration is unknown due to the vCenter crash, and the physical switch configuration is also unknown. The uplinks are assigned as individual vmnics with standard teaming policies, but VMs cannot reach their gateways. Troubleshooting requires reviewing the physical switch configuration to determine whether LACP port channels, specific trunk VLAN lists, or native VLAN settings are in use — and then rebuilding the vDS configuration to match.

Symptoms include:

  • VMs on the vDS have no network connectivity after the vCenter rebuild
  • Individual vmnic uplinks show link up but VMs cannot reach their gateways
  • Some VLANs work on one ESXi host but not another
  • ARP requests are seen leaving the VM but no replies return
  • CDP or LLDP neighbor information returns null on certain uplinks
  • The original vDS configuration is unavailable due to the vCenter failure

Environment

  • VMware vCenter Server
  • VMware vSphere ESXi
  • vSphere Distributed Switch recreated after vCenter rebuild

Cause

When vCenter Server is rebuilt and the vDS is recreated without knowledge of the original configuration, the uplinks may be assigned as individual vmnics without a LAG. If the physical switch ports are configured in LACP port channels, the ESXi uplinks cannot negotiate LACP and traffic does not pass. A secondary cause occurs when ESXi hosts connect to different physical switches that handle VLAN tagging differently. If one switch trunks a VLAN as tagged and another configures the same VLAN as the native (untagged) VLAN, a single vDS port group cannot serve both hosts for that VLAN. The port group with a VLAN ID set expects tagged frames and does not match untagged native VLAN traffic.

Resolution

Identify the Physical Switch Configuration

Since the original vDS configuration is unknown, start by identifying what the physical switches expect. Run CDP/LLDP queries from each ESXi host to identify the connected switch. For more information on CDP/LLDP discovery, see Cisco Discovery Protocol (CDP) network information (324750):

vim-cmd hostsvc/net/query_networkhint --pnic-name=<vmnic>

Repeat for each vmnic. Note the switch name, port ID, and any VLAN information returned. CDP/LLDP neighbor details can also be viewed from the vSphere Client by navigating to the ESXi host → Configure → Network → Physical adapters, selecting the vmnic, and expanding the CDP/LLDP tab. For more detail on this approach, see Testing VMkernel network connectivity with the vmkping command (344313).

If CDP/LLDP returns null, coordinate with the network team to trace the physical cabling and identify the switch ports.

Providing MAC addresses to the network team

If CDP/LLDP is not returning neighbor information, the network team may need MAC addresses to locate the ESXi host ports on the physical switch using their ARP or MAC address tables.

VM MAC addresses can be found in vCenter by right-clicking the VM, selecting Edit Settings, and expanding the network adapter — the MAC address is listed there. From the ESXi CLI, run the following to list each DVPort with the connected client name. For more information on viewing vSwitch configuration via CLI, see Configuring Standard vSwitch (vSS) or virtual Distributed Switch (vDS) from the command line in ESXi (326175):

esxcfg-vswitch -l

To see the MAC address for a specific VM network adapter:

vim-cmd vmsvc/get.guest <vmid> | grep -A2 macAddress

Physical adapter (vmnic) MAC addresses can be found by running the following. For more information on listing physical NICs and their properties, see Identifying and listing Network Interface Cards (NICs) on VMware ESXi (412312):

esxcfg-nics -l

This lists every vmnic with its MAC address, link speed, and driver. The network team can use these MAC addresses to search their switch ARP or MAC address tables and identify which physical switch ports the ESXi uplinks are connected to.

Review the physical switch configuration

Coordinate with the network team to review the physical switch configuration for each port connected to the ESXi uplinks. Identify the following:

  • Whether the ports are in an LACP port channel
  • Load balancing algorithm in use on the port channel
  • LACP mode (active or passive)
  • LACP timeout (fast or slow)
  • Trunk allowed VLAN list
  • Native VLAN setting

If LACP port channels are configured on the physical switch, a corresponding LAG must be created on the vDS. For LACP requirements and limitations in ESXi, see Host requirements for link aggregation (etherchannel, port channel, or LACP) in ESXi (324555).

Create the LAG on the vDS

Follow the procedure outlined in Configuring a LAG on a vSphere Distributed Switch Port Group when using LACP (312554). For definitive reference on LAG configuration, see LACP Support on a vSphere Distributed Switch (TechDocs).

  1. In the vSphere Client, navigate to the Distributed Switch and select Configuration > LACP > New.
  2. Configure the LAG:
    • Name the LAG.
    • Set the number of ports to match the number of uplinks that will be in the LAG.
    • Set Mode to Active (at minimum, one side of the LACP negotiation must be Active).
    • Set Load balancing mode to match the physical switch configuration.
    • Set Timeout to match the physical switch (Slow = 30 seconds, Fast = 1 second). For more information on timeout matching, see LACP Timeout mode parameter mismatch between the ESXi host and physical switches (376191).

Assign Uplinks to the LAG

Migrate uplinks one at a time to avoid connectivity loss:

  1. Right-click the vDS and select Add and Manage Hosts > Manage host networking.
  2. Select the ESXi host.
  3. On the Manage physical adapters page, select the first vmnic and click Assign Adapter to assign it to one of the LAG uplink slots.
  4. Coordinate with the network team to confirm LACP negotiation completes on the physical switch before assigning the next vmnic.
  5. Repeat for the remaining vmnic(s).

The same LAG definition on the vDS serves all hosts — each host assigns its own local vmnics to the LAG uplink slots. Each host will have its own port channel on the physical switch, but all map to the same LAG on the vDS.

Verify LACP Negotiation

Run the following on the ESXi host:

esxcli network vswitch dvs vmware lacp status get

Confirm the following for each NIC in the LAG:

  • State: Bundled — LACP negotiation succeeded.
  • Flags: SA — Slow timeout, Active mode (or FA for Fast timeout, Active mode).
  • Port State: ACT,AGG,SYN,COL,DIST — All LACP state flags are set, indicating the ports are aggregating, synchronized, collecting, and distributing traffic.
  • Partner Information is present with a valid Device ID — the physical switch is responding.

To verify the LAG configuration settings, run:

esxcli network vswitch dvs vmware lacp config get

For more information on interpreting these outputs, see LACP Timeout mode parameter mismatch between the ESXi host and physical switches (376191).

Also coordinate with the network team to verify the port channel status from the physical switch side, confirming both ports are bundled and in use.

Update Port Group Teaming and Failover

  1. Edit each VM port group on the vDS.
  2. Under Teaming and failover, move the LAG to Active uplinks and move all individual uplinks to Unused uplinks.
  3. Set Load balancing to match the physical switch hashing algorithm. Do not use "Route based on originating virtual port" when LACP is in use.

Test VM Connectivity per VLAN

Test each VLAN by pinging the gateway from a VM on that VLAN. If all VLANs are working, the resolution is complete. If a specific VLAN fails while others succeed, continue to the next section.

Troubleshooting Individual VLAN Failures

Identify native VLAN mismatches across physical switches

When ESXi hosts connect to different physical switches, verify whether each switch handles the failing VLAN the same way. Compare the physical switch configurations identified earlier. Check whether the failing VLAN is configured as the native VLAN on one switch but trunked as tagged on another. If so, the vDS port group with VLAN type "VLAN" and the corresponding VLAN ID only accepts tagged frames and does not match the untagged native VLAN traffic.

If CDP/LLDP returned null for certain uplinks but LACP is bundled, use the LACP partner Device ID from the esxcli network vswitch dvs vmware lacp status get output to identify which physical switch the host is connected to.

Capture traffic to confirm the tagging mismatch

Run a packet capture filtered for the failing VLAN on the uplink. For full details on pktcap-uw syntax and capture points, see Packet capture on ESXi using the pktcap-uw tool (341568):

pktcap-uw --uplink <vmnic> --vlan <VLAN_ID> -c 20 -o - | tcpdump-uw -enr -

If no traffic appears in the filtered capture, capture at the VM switchport to confirm the VM is sending traffic:

pktcap-uw --switchport <DVPort_ID> --capture VnicTx,VnicRx -o - | tcpdump-uw -r - -enn

If traffic appears in the unfiltered switchport capture but not in the VLAN-filtered uplink capture, the VLAN traffic is arriving from the physical switch untagged — confirming the native VLAN mismatch.

Create a separate port group for the native VLAN

If the network team is unable to change the physical switch to trunk the VLAN as tagged, create a separate port group to accept untagged traffic:

  1. On the vDS, create a new Distributed Port Group.
  2. Set the VLAN type to None — this accepts untagged frames.
  3. Under Teaming and failover, set the LAG as Active uplink and individual uplinks as Unused.
  4. Move the affected VMs on the host with the native VLAN configuration to this new port group.
  5. Leave the original VLAN-tagged port group in place for hosts connected to switches that trunk the VLAN as tagged.

Note: This configuration creates an inconsistency between hosts where the same VLAN is handled by different port groups depending on the physical switch configuration. Document this for future reference. The recommended fix is to have the network team tag the VLAN on the physical switch trunk to maintain consistency across all hosts.

Verify Final Connectivity

Test the affected VMs by pinging the gateway. Run a packet capture at both the switchport and uplink to confirm traffic is flowing in both directions:

pktcap-uw --switchport <DVPort_ID> --capture VnicTx,VnicRx -o - | tcpdump-uw -r - -enn
pktcap-uw --uplink <vmnic> --capture UplinkSndKernel,UplinkRcvKernel -o - | tcpdump-uw -r - -enn

If the error persists after following these steps, contact Broadcom Support for further assistance.

Provide the following information when opening a support request:

  • Output of esxcli network vswitch dvs vmware lacp status get and esxcli network vswitch dvs vmware lacp config get from each affected ESXi host
  • Output of esxcfg-vswitch -l from each affected ESXi host
  • Output of esxcfg-nics -l from each affected ESXi host
  • Output of vim-cmd hostsvc/net/query_networkhint for each vmnic
  • Physical switch port channel configuration including trunk allowed VLANs and native VLAN settings
  • Packet captures from both the switchport and uplink capture points for the failing VLAN