How to fix vSAN Cluster partition in case of ongoing network issues on vSAN vmkernel uplink
search cancel

How to fix vSAN Cluster partition in case of ongoing network issues on vSAN vmkernel uplink

book

Article ID: 388590

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms : 

  • vSAN cluster network partition alarm in Skyline health.
  • All vSAN nodes are in network paritioned state.
  • vmkping from vSAN data nodes failed which indicates there is an ongoing vSAN network communication issue.

    Syntax : vmkping -I vmkX x.x.x.x

    Example : 

    [root@test:/tmp] vmkping -I vmk3 10.x.x.23
    PING 192.x.x.23 (192.x.x.23): 56 data bytes
    --- 10.x.x.23 ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss
    [root@test:/tmp] 
  • Verify vSAN connectivity to other vSAN node by running the below vmkping tests.
    To test 1500 MTU, run the command: vmkping -I vmkX x.x.x.x -d -s 1472
    To test 9000 MTU, run the command: vmkping -I vmkX x.x.x.x -d -s 8972

Environment

VMware vSAN 8.0.x
VMware vSAN 7.0.x

Cause

If there is a transient network condition or firewall issue or any ongoing network issues on vSAN vmkernel uplink then this can cause a vSAN network partition issue.

Resolution

As a workaround, we can check the vmotion network or management network connectivity between the vSAN data nodes and enable the vSAN traffic on one of the working networks.

Example : 

vSAN nodes can reach each other using the management network and the same is validated using the vmkping ping command. 

vmkping -I vmkX x.x.x.x

Now we can enable the vSAN traffic on the management network via the vSphere web client or the command line. 

Enabling vSAN Traffic via vSphere UI :

  •  Open the vSphere Web Client.
  •  Go to the Hosts and Clusters view.
  •  Select the host you want to configure for vSAN.
  •  Click the Configure tab.
  •  Click Networking.
  •  Select VMkernel Adapters
  •  Edit the vmk2 and check the vSAN option to enable vSAN services for vmk2.

Enabling vSAN traffic on the management network via the command line.

we have enabled the vSAN traffic on the Management network(vmk2) on all the data nodes and updated the unicast table on all the data nodes which fixed the vSAN Cluster partition issue. 

[root@test1:~] localcli vsan network ipv4 add -i vmk2
[root@test1:~]

[root@test2:~] localcli vsan network ipv4 add -i vmk2
[root@test2:~]

[root@test3:~] localcli vsan network ipv4 add -i vmk2
[root@test3:~]

Then manually update the vSAN data node Unicast entry on all the vSAN nodes. 

esxcli vsan cluster unicastagent add -t node -u <Host_UUID> -U true -a <Host_VSAN_IP> -p 12321

Note: The vSAN node should not have its IP address added to the unicastagent list in the data node.

Example: test cannot have its own vmk2 (management network) IP address in its unicast list, whereas it can have other nodes vSAN vmk2 management IP address in its unicast table list.

localcli vsan cluster unicastagent add -t node -u xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -U true -a 10.x.x.22 -p 12321;
localcli vsan cluster unicastagent add -t node -u xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -U true -a 10.x.x.23 -p 12321;

Note : Once the ongoing networking issue is fixed we need to revert the changes that were done using the above steps and enable the vSAN traffic back to the original vmkernel. 

Before removing the vSAN traffic from the management node on all the vSAN nodes, ensure the old vSAN vmkernel network connectivity is working fine for all vSAN nodes.

Example : 

1> Connect to node 1 via putty and do a ping test using the below command. 
 
vmkping -I vmkX #.#.#.#

where x.x.x.x is the hostname or IP address of the server that you want to ping and vmkX is the vmkernel interface to ping out of.

Once the networking connectivity issue for the vSAN vmkernel is fixed then proceed to enable the vSAN traffic on the original vSAN vmkernel : 

The following steps needs to be performed on all vSAN nodes. 

Enabling vSAN services on the original vmkernel Using vSphere UI : 

  •  Open the vSphere Web Client.
  •  Go to the Hosts and Clusters view.
  •  Select the host you want to configure for vSAN.
  •  Click the Configure tab.
  •  Click Networking.
  •  Select VMkernel Adapters
  •  Edit the vmk3 and check the vSAN option to enable vSAN services for vmk3.

Removing the vSAN services on the management network Using vSphere UI : 

  •  Open the vSphere Web Client.
  •  Go to the Hosts and Clusters view.
  •  Select the host you want to configure for vSAN.
  •  Click the Configure tab.
  •  Click Networking.
  •  Select VMkernel Adapters
  • Edit the vmk2 and uncheck the vSAN option from the enabled services for vmk2.

Using the command line.

connect to all vSAN nodes via putty session and run the below command to remove the vSAN traffic on vmk2 (management network)

localcli vsan network remove -i vmk2