Recover Inaccessible vCenter: vSAN Cluster Partition after vSphere Distributed Switch Changes
search cancel

Recover Inaccessible vCenter: vSAN Cluster Partition after vSphere Distributed Switch Changes

book

Article ID: 439748

calendar_today

Updated On:

Products

VMware vSAN VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

  • vCenter Server is inaccessible via the vSphere Client and fails to respond to pings.
  • Because the vCenter VM resides on the vSAN datastore, its status displays as InvalidUnknown, or disconnected in the ESXi Host Client.
  • The vSAN datastore appears empty or reports 0B capacity on the partitioned hosts.
  • Executing esxcli vsan cluster get on all ESXi hosts indicates a Sub-Cluster Member Count of 1.
  • Since vCenter is down, vSAN Skyline Health cannot be viewed in the UI. However, running localcli vsan health cluster list -w directly on the ESXi host reports the following errors:
    • Overall health findings: red (Network misconfiguration)
    • Network: red
    • vSAN cluster partition: red

Environment

  • VMware vSAN 8.0.x, 9.x
  • VMware vSphere ESXi 8.0.x, 9.x
  • VMware vCenter Server 8.0.x, 9.x

Cause

A configuration change to the vSphere Distributed Switch (vDS) results in a network partition, isolating the ESXi hosts on the vSAN network. Common triggers include the removal or change of vDS uplinks or modifications to port group VLAN tags.

As the vSAN cluster is in a full cluster partition the vSAN datastore is not accessible, resulting in the vCenter going offline. This creates a circular dependency: vCenter is required to manage and correct the vDS, but vCenter cannot be brought online until the vSAN network is repaired locally on all the hosts to restore access to the vSAN datastore.

Resolution

Prerequisites Before Recovery:

  • There should be at least two vmnics assigned to the vSAN network. One vmnic should be removed from the vDS vSAN port group and used for the temporary Standard Switch (vSS) that will be created to restore vSAN network connectivity, bring the vSAN datastore back online, and restore production services.
  • Identify the correct VLAN tag for the vSAN vmkernel and the uplink vmnic before removing



Step 1: Establish Temporary vSAN Networking on all ESXi hosts

  1. Connect to the partitioned ESXi hosts via SSH.
  2. Identify the vSAN vDS Port IDs for the physical NICs using the following command:
    esxcli network vswitch dvs vmware list

    Sample Output:
    Name: vDSName
    VDS ID: ########   Class: vswitch   Num Ports: ####   Used Ports: ##
    Configured Ports: ##
    MTU: 9000/1500
    CDP Status: listen
    Beacon Timeout: -/+#
    Uplinks: vmnic1, vmnic2
    VMware Branded: true
    DVPort:
           Client: vmnic1
           DVPortgroup ID: dvportgroup-###
           In Use: true
           Port ID: 12
  3. Remove a physical uplink (vmnic) from the vDS to free it up for the Standard Switch:
    esxcfg-vswitch -Q <vmnicX> -V <Port_ID> <vDS_Name>

    Sample Command:
    esxcfg-vswitch -Q vmnic1 -V 12 ProdSwitchvDS
  4. Create a temporary Standard Switch (vSS), a Portgroup and add the vmnic:

    1. Create a Standard switch
      #esxcli network vswitch standard add --vswitch-name=<vSwitchName>
    2. Create a Portgroup

      #esxcli network vswitch standard portgroup add --portgroup-name=<PortgroupName> --vswitch-name=<vSwitchName>


    3. Add a vmnic to the Standard Switch

      #esxcli network vswitch standard uplink add --uplink-name=<vmnic#> --vswitch-name=<vSwitchName>


    4. If the vmnics associated with the network are VLAN trunks on the physical switchport, a VLAN ID for the corresponding standard portgroup will need to be applied as well. If associated vmnics are configured as Access mode VLAN on physical switchport, set VLAN ID to 0.  To set or correct the VLAN ID required for connectivity on a Standard vSwitch, run this command:

      #esxcli network vswitch standard portgroup set --portgroup-name=<PortgroupName> --vlan-id <VLAN>

Step 2: Restore vSAN Datastore Accessibility

  1. Capture IP Information: Note the current IP address and subnet mask of the original vDS vSAN VMkernel adapter (e.g., vmk2):
    esxcli network ip interface ipv4 get --interface-name=<old_vmk#>
  2. Clear Old vDS Tags: Remove the vSAN tag and IP address from the original vDS VMkernel adapter to prevent conflicts: esxcli network ip interface ipv4 set --interface-name=<old_vmkX> --type=none
    esxcli network ip interface tag remove --interface-name=<old_vmk#> --tagname=VSAN
  3. Create the new vSAN VMkernel adapter on the vSS port group:
    esxcli network ip interface add --interface-name=<new_vmk#> --portgroup-name=<PortGroup_Name>
  4. Configure the static IPv4 parameters:
    esxcli network ip interface ipv4 set --interface-name=<new_vmk#> --ipv4=<IP_ADDRESS> --netmask=<SUBNET_MASK> --type=static
  5. Tag the new interface for vSAN traffic:
    esxcli network ip interface tag add --interface-name=<new_vmk#> --tagname=VSAN

Step 3: Recover the vCenter VM

  1. Re-register the vCenter VM if it reports as invalid from the Host UI
  2. Power on the vCenter VM:
    vim-cmd vmsvc/power.on <vCenter_VM_ID>

Step 4: Final Migration

  1. Once vCenter is online, log into the vSphere Client and correct the vDS settings.
  2. Migrate vmnics and VMkernel adapters back to the vDS and delete the temporary vSS.

If any of these steps fail or require guided intervention, contact Broadcom Support for further assistance.

Additional Information

Related Articles