Enabling vSAN File services in vSAN fails with "operation failed" and when checking "vsanfs.mgmt.log" you see:
2021-10-31T13:43:08.378Z error [EndpointController-1] [_CreateContainers] CONT: Failed to create container 10.2.33.4 Traceback (most recent call last): File "/usr/lib/vmware/vsan/perfsvc/VDFSEndpointController.py", line 1683, in _CreateContainers File "/usr/lib/vmware/vsan/perfsvc/VDFSEndpointProtoUtil.py", line 575, in WaitForProtoService File "/usr/lib/vmware/vsan/perfsvc/VDFSEndpointProtoUtil.py", line 684, in _WaitForContainerStartup Exception: Timed out when waiting for container (vsanfs-04) startup
/scratch/log/vdfs_support/containers/fsvm_logs/journal find container was going to join the AD domain but can not touch the AD controller .
/scratch/log/vdfs_support/containers/fsvm_logs/journal
Nov 01 08:22:21 localhost vsfs-vsanfs-04[1429]: Directory setup
Nov 01 08:22:22 localhost vsfs-vsanfs-04[1429]: Update configuration
Nov 01 08:22:22 localhost vsfs-vsanfs-04[1429]: Update krb5.conf
Nov 01 08:22:22 localhost vsfs-vsanfs-04[1429]: Doing time sync with the provided NTP server or AD: domain.###
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Error resolving united-imaging.com: Name or service not known (-2)
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: 1 Nov 08:22:42 ntpdate[42]: Can't find host domain.xxx: Name or service not known (-2)
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: 1 Nov 08:22:42 ntpdate[42]: no servers can be used, exiting
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: System has not been booted with systemd as init system (PID 1). Can't operate.
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Failed to connect to bus: Host is down
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Could not sync to provided ntp server united-imaging.com, will still try domain join
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Starting Winbind services:
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Waiting for winbindd to join domain...
Nov 01 08:22:46 localhost vsfs-vsanfs-04[1429]: ERROR: BUILTIN : active connection
Nov 01 08:22:46 localhost vsfs-vsanfs-04[1429]: VSANFS-04 : active connection
Nov 01 08:22:46 localhost vsfs-vsanfs-04[1429]: xxxx.xxxx : no active connection
Nov 01 08:22:51 localhost vsfs-vsanfs-04[1429]: ERROR: BUILTIN : active connection
Nov 01 08:22:51 localhost vsfs-vsanfs-04[1429]: VSANFS-04 : active connection
Nov 01 08:22:51 localhost vsfs-vsanfs-04[1429]: xxxx.xxxx : no active connection
vSAN File Service Containers use 'MACVLAN' network driver to export the MAC address to external network.
In order to achieve this the vDS port group being used for vSANFS needs to have "MAC-Learning=TRUE".
When you enable vSANFS this is done automatically by vSAN health plugin but, in rare situations it would be still set to "FALSE" and issued start up File service container failure.
esxcfg-vswitch -l | grep "DVPort ID\|File Service Node"
netdbg vswitch mac-learning port get --dvport DVPortID --dvs-alias DVSName
[root@esxi05:~] esxcfg-vswitch -l | grep "DVPort ID\|File Service Node"
DVPort ID In Use Client
40 1 vSAN File Service Node (4).eth0
[root@esxi05:~] esxcfg-vswitch -l
DVS Name Num Ports Used Ports Configured Ports MTU Uplinks
GargantuaDSwitch 1940 10 512 1500 vmnic3,vmnic1,vmnic2
DVPort ID In Use Client
16 1 vmnic1
17 1 vmnic2
12 1 vmk1
4 1 vmk2
38 1 vmnic3
40 1
vSAN File Service Node (4).eth0
[root@esxi05:~] netdbg vswitch mac-learning port get --dvport 40 --dvs-alias GargantuaDSwitch
MAC Learning: True
Unknown Unicast Flooding: TrueMAC Limit: 64
MAC Limit Policy: DROP
[root@esxi05:~] netdbg vswitch mac-learning port get --dvport 40 --dvs-alias GargantuaDSwitch
MAC Learning: False
Unknown Unicast Flooding: FalseMAC Limit: 4096
MAC Limit Policy: ALLOWED
Ref: Networking Considerations for vSAN File Service
Note: If using NSX-based network, ensure that MacLearning is enabled for the provided network entity from the NSX admin console, and all the hosts and File Services nodes are connected to the desired NSX-T network.