Network partition observed in vSAN cluster
search cancel

Network partition observed in vSAN cluster

book

Article ID: 393677

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms :

  • Multiple virtual machines in a vSAN-enabled cluster appear as "Inaccessible" or "Reduced availability" state in vCenter Server .
  • Verify the object health status through the CLI using the following command:
    esxcli vsan debug objcet health summary get:
    Health Status                                              Number Of Objects
    ---------------------------------------------------------  -----------------
    remoteAccessible                                                           0
    inaccessible                                                             260
  • The host reports that the sub-cluster member count is 1 which indicates the host is in network partition state which is observed while executing esxcli vsan cluster get command from ssh session.

       [root@ESX2~] esxcli vsan cluster get
     Cluster Information
     Enabled: true
     Current Local Time: 2025-02-07T00:16:32Z
     Local Node UUID: ########-####-####-####-########
     Local Node Type: NORMAL
     Local Node State: MASTER
     Local Node Health State: HEALTHY
     Sub-Cluster Master UUID: ########-####-####-####-########
     Sub-Cluster Backup UUID:
     Sub-Cluster UUID: ########-####-####-####-########
     Sub-Cluster Membership Entry Revision: 2
     Sub-Cluster Member Count: 1
     Sub-Cluster Member UUIDs: ########-####-####-####-########
     Sub-Cluster Member HostNames: Hostname###
     Sub-Cluster Membership UUID: ########-####-####-####-########
     Unicast Mode Enabled: true
     Maintenance Mode State: OFF

Environment

VMware  vSAN 7.x

VMware vSAN 8.x

 

Cause

The communication between vSAN hosts failing from the host reporting partition.

  • Ping between nodes, the response indicates that the host is down, which suggests a network communication issue

                vmkping -I vmk1 ##.#.##.###
        PING ##.#.##.### (##.#.##.###): 56 data bytes
        sendto() failed (Host is down)

  • The var/run/log/VMkernel log file on the host reports that the host is down.
    YYYY-MM-DDTHH:MM.SSSZ In(182) vmkernel: cpu42:2119777)CMMDSNet: CMMDSNetSendtoUnicastChannels:1665: Throttled: ########-####-####-####-############: Failed to send to unicast host '##.###.###.###;12321' on iface '##.###.###.###': Host is down.
    YYYY-MM-DDTHH:MM.SSSZ In(182) vmkernel: cpu41:2119777)CMMDSNet: CMMDSNetSendtoUnicastChannels:1665: Throttled: ########-####-####-####-############: Failed to send to unicast host '##.###.###.###;12321' on iface '##.###.###.###': Host is down.
    YYYY-MM-DDTHH:MM.SSSZ In(182) vmkernel: cpu41:2119777)CMMDSNet: CMMDSNetSendtoUnicastChannels:1665: Throttled: ########-####-####-####-############: Failed to send to unicast host '##.###.###.###;12321' on iface '##.###.###.###': Host is down.

Resolution

  • Engage networking team to resolve the network issue.
  • Alternatively, if the secondary NIC is functional, set it as active until the issue with the primary NIC is resolved

Additional Information