Failed to create container while enabling vSAN file services.
search cancel

Failed to create container while enabling vSAN file services.

book

Article ID: 326631

calendar_today

Updated On: 11-28-2024

Products

VMware vSAN

Issue/Introduction

Symptoms:

Enabling vSAN File services in vSAN fails with "operation failed" and when checking "vsanfs.mgmt.log" you see:

2021-10-31T13:43:08.378Z error [EndpointController-1] [_CreateContainers] CONT: Failed to create container 10.2.33.4 Traceback (most recent call last): File "/usr/lib/vmware/vsan/perfsvc/VDFSEndpointController.py", line 1683, in _CreateContainers File "/usr/lib/vmware/vsan/perfsvc/VDFSEndpointProtoUtil.py", line 575, in WaitForProtoService File "/usr/lib/vmware/vsan/perfsvc/VDFSEndpointProtoUtil.py", line 684, in _WaitForContainerStartup Exception: Timed out when waiting for container (vsanfs-04) startup

/scratch/log/vdfs_support/containers/fsvm_logs/journal find container was going to join the AD domain but can not touch the AD controller .

/scratch/log/vdfs_support/containers/fsvm_logs/journal

Nov 01 08:22:21 localhost vsfs-vsanfs-04[1429]: Directory setup
Nov 01 08:22:22 localhost vsfs-vsanfs-04[1429]: Update configuration
Nov 01 08:22:22 localhost vsfs-vsanfs-04[1429]: Update krb5.conf
Nov 01 08:22:22 localhost vsfs-vsanfs-04[1429]: Doing time sync with the provided NTP server or AD: domain.###
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Error resolving united-imaging.com: Name or service not known (-2)
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: 1 Nov 08:22:42 ntpdate[42]: Can't find host domain.xxx: Name or service not known (-2)
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: 1 Nov 08:22:42 ntpdate[42]: no servers can be used, exiting
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: System has not been booted with systemd as init system (PID 1). Can't operate.
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Failed to connect to bus: Host is down
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Could not sync to provided ntp server united-imaging.com, will still try domain join
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Starting Winbind services:
Nov 01 08:22:42 localhost vsfs-vsanfs-04[1429]: Waiting for winbindd to join domain...
Nov 01 08:22:46 localhost vsfs-vsanfs-04[1429]: ERROR: BUILTIN : active connection
Nov 01 08:22:46 localhost vsfs-vsanfs-04[1429]: VSANFS-04 : active connection
Nov 01 08:22:46 localhost vsfs-vsanfs-04[1429]: xxxx.xxxx : no active connection
Nov 01 08:22:51 localhost vsfs-vsanfs-04[1429]: ERROR: BUILTIN : active connection
Nov 01 08:22:51 localhost vsfs-vsanfs-04[1429]: VSANFS-04 : active connection
Nov 01 08:22:51 localhost vsfs-vsanfs-04[1429]: xxxx.xxxx : no active connection

Environment

VMware vSAN 7.0.x

Cause

vSAN File Service Containers use 'MACVLAN' network driver to export the MAC address to external network.

In order to achieve this the vDS port group being used for vSANFS needs to have "MAC-Learning=TRUE".

When you enable vSANFS this is done automatically by vSAN health plugin but,  in rare situations it would be still set to "FALSE" and issued start up File service container failure.

Resolution

  • SSH log into the host and confirm the FSVM ''VDPort ID" and "DVS Name":

esxcfg-vswitch -l | grep "DVPort ID\|File Service Node"

  • Then run the following command using the corresponding "DVPort ID" and "DVS Name":
    1. netdbg vswitch mac-learning port get --dvport DVPortID --dvs-alias DVSName

    2. Example as below.
    3. Get the DVPort ID of the FSVM running in the host:

[root@esxi05:~] esxcfg-vswitch -l | grep "DVPort ID\|File Service Node"
  DVPort ID In Use Client
  40  1 vSAN File Service Node (4).eth0

  • Confirm what's the DVS Name of the distributed switch where this FSVM is running:

[root@esxi05:~] esxcfg-vswitch -l
DVS Name         Num Ports   Used Ports  Configured Ports  MTU     Uplinks
GargantuaDSwitch  1940        10          512               1500    vmnic3,vmnic1,vmnic2

  DVPort ID                               In Use      Client
  16                                      1           vmnic1
  17                                      1           vmnic2
  12                                      1           vmk1
  4                                       1           vmk2
  38                                      1           vmnic3
  40                                      1           vSAN File Service Node (4).eth0

  • Now confirm the "MAC-Learning" status of the FSVM's DVPort:

[root@esxi05:~] netdbg vswitch mac-learning port get --dvport 40 --dvs-alias GargantuaDSwitch
MAC Learning:                   True
Unknown Unicast Flooding:       True

MAC Limit:                      64
MAC Limit Policy:               DROP

  • On a bad scenario it should look like this:

[root@esxi05:~] netdbg vswitch mac-learning port get --dvport 40 --dvs-alias GargantuaDSwitch
MAC Learning:                   False
Unknown Unicast Flooding:       False

MAC Limit:                      4096
MAC Limit Policy:               ALLOWED

  • If you confirmed the MAC Learning of FSVM dvport is not TRUE , you can edit the vSANFS vDS Port Group settings and change enable it under "Security:




 

Additional Information

Ref: Networking Considerations for vSAN File Service

Note: If using NSX-based network, ensure that MacLearning is enabled for the provided network entity from the NSX admin console, and all the hosts and File Services nodes are connected to the desired NSX-T network.