After enabling LAG / LACP on Distributed Virtual Switch on vSphere, vSAN cluster hosts failed to communicate between each other and cause vSAN cluster partition.
VMware vSAN 7.x
VMware vSAN 8.x
VMware vSAN 9.x
When the backend physical network is not fully prepared for configuring LACP / LAG vSphere DVS, after enabling the LAG it would cause the communication issues on individual network cards.
This would cause the vSAN network communication to break and cause cluster partition and makes the VMs go inaccessible.
In order to resolve the issue, the backend physical switch ports should be configured with the requirements for using LACP/LAG. Refer the below articles for more information on preparing for enabling LAG/LACP on DVS.
Configuring a LAG on a vSphere Distributed Switch Port Group when using LACP
Example Configuration of LACP on VMware, Cisco, HP, Dell switches
As a workaround, if there are two network cards used for vSAN network, bring down one NIC on all the hosts would let the communication work and make the vSAN healthy with single network card. Once the cluster is stable, the underlying configuration issues should be fixed.