One of the vsan file server (container) restarts on a continuous basis on different FSVMs.
search cancel

One of the vsan file server (container) restarts on a continuous basis on different FSVMs.

book

Article ID: 433357

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • vSAN skyline health for the File Server Connectivity show File server is (re)starting. warnings for one of the file server.

  • One of the file server (example: vsanfs01) or container restarting on a continuous basis and may show up on a different host as it choose different FSVM to start as host. 

  • We may see the container is starting and mark for failover after waiting for a period of time. This happens every minute or two minutes.

  • From vCenter UI for vSAN health, it shows the warning for file service health as below.

    • Example NFS Server name: vsanfs01

    • Example NFS share: nfs-share

  • Upon verifying the logs on the host /var/run/log/vsanfs.mgmt.log we see the events as below 
2026-02-25T08:18:46.553Z In(14) vsanfs.mgmt[2930670] [EndpointController-2] [VDFSFSVMStub::SetupContainerConfig] CONT: start setup container config: vsanfs01
2026-02-25T08:18:46.656Z In(14) vsanfs.mgmt[2930670] [EndpointController-2] [VDFSEndpointContainerDocker::Create] CONT: starting FSContainer vsanfs01
2026-02-25T08:18:46.857Z In(14) vsanfs.mgmt[2930670] [EndpointController-2] [VDFSEndpointContainerDocker::Create] CONT: created FSContainer vsanfs01
2026-02-25T08:18:46.858Z In(14) vsanfs.mgmt[2930670] [EndpointController-2] [VDFSEndpointContainerDocker::WaitForStartup] CONT: waiting 305s for container vsanfs01 startup ...
  • Upon verifying the configuration for the file servers, we see the container showing up in "failoverState": "FAILING_OVER", "failoverWaiter": state.
{"host-fscontainer-map": {"612c8a17-d057-d61a-ab1d-########": ["10.X.X.9"], "612c8a30-d622-c6e4-a3d4-####": ["10.X.X.7"], "612c8a28-6327-a596-997f-########": ["10.X.X.8", "10.X.X.6"]}, "fscontainer-properties": {"10.X.X.6": {"fscontainerState": "READY", "affinityLocation": "None", "fscontainerWaiter": "612c8a17-d057-d61a-ab1d-####", "failoverState": "FAILING_OVER", "failoverWaiter":

Environment

  • VMware vSAN 8.x

  • vSAN File services 8.x

Cause

  • Container or file server restart caused by a missing DOM object which was a root object earlier and has an association with one of the file share.

  • The DOM object that does not exists at vSAN layer would show up in the configuration file (ganesha.conf) associated with NFS file share created earlier.

  • This can be validated by verifying the /var/run/log/vdf-proxy.log on ESXi host you would see the events as below.

2026-03-13T02:57:44.711Z|f-7-004145610|DISCO: No(29) vdfsd-proxy[2105044] 81825161-ba8d-f7a4-e3bd-########: Failed to find DOM_OBJECT: Not found

  • Upon verifying the configuration of the NFS server in the path /vmfs/volumes/vdfsDatastore/XXX/volumes/YYY/default/ZZZ/<nfs server with issue as shown in vsan health>/etc/ganesha.conf, you may see the DOM object still referred in the configuration.

  • Edit the ganesha.conf file on one of the ESXi host and validate the configuration.

NFS_CORE_PARAM { MNT_Port = 20048; Rquota_Port = 875; Plugins_Dir = /usr/lib/ganesha; Enable_FULLV3_Stats = true; Enable_FULLV4_Stats = true; Dbus_Name_Prefix = vsanfs01; mount_path_pseudo = true; NLM_Port = 32803; Enable_NLM = true; Bind_Addr = 10.xx.xx.6; RPC_Ioq_ThrdMax = 100; }
EXPORT_DEFAULTS { Delegations = None; Protocols = 3,4; SecType = sys; }
NFSV4 { Delegations = false; Minor_Versions = 1; Lease_Lifetime = 60; Grace_Period = 90; IdmapConf = /etc/idmapd.conf; }
NFS_KRB5 { PrincipalName = "[email protected]"; KeytabPath = /etc/krb5.keytab; Active_krb5 = YES; CCacheDir = /var/run/ganesha; }
MDCACHE { LRU_Run_Interval = 60; FD_Limit_Percent = 7; FD_HWMark_Percent = 1; FD_LWMark_Percent = 0; Entries_HWMark = 102400; Reaper_Work_Per_Lane = 200; }
VDFS { Superuser = "administrator@domain"; }

EXPORT { Export_Id = 100; Path = "/vsfs/81825161-ba8d-f7a4-e3bd-########/volumes/5aa19700-e24c-8c35-2aa1-####/default"; Pseudo = "/nfs-share"; Access_Type = None; FSAL { Name = VDFS; } Protocols = 3; SecType = sys; CLIENT { Clients = 10.X.X.0/22; Access_Type = RW; Squash = None;  }  }
EXPORT { Export_Id = 101; Path = "/vdfs_rootfs_mnt/16815161-5473-7b62-b295-########/volumes/34aa6b03-badd-8e17-8a7c-########/default/2f7c6fb1-86c7-4bcd-8fcf-########/referrals/nfs-share"; Pseudo = "/vsanfs/nfs-share"; Access_Type = RO; FSAL { Name = VDFS; } Protocols = 4; SecType = sys,krb5,krb5i,krb5p;  }

Resolution

  • Validate the file share health. If the object is already deleted, it would not be in use or file share would have been not accessible.

  • Once the file share health is validated, please remove the configuration from the NFS server's ganesha.conf  for the share with missing DOM object.

NOTICE: Please take a backup for the ganesha.conf file to a persistent location before modifying the original configuration file.

  • Example entries to be removed.

  • We need to remove vsfs and vdfs_rootfs_mnt entries for the file share nfs-share as referred below.

EXPORT { Export_Id = 100; Path = "/vsfs/81825161-ba8d-f7a4-e3bd-########/volumes/5aa19700-e24c-8c35-2aa1-####/default"; Pseudo = "/nfs-share"; Access_Type = None; FSAL { Name = VDFS; } Protocols = 3; SecType = sys; CLIENT { Clients = 10.X.X.0/22; Access_Type = RW; Squash = None;  }  }
EXPORT { Export_Id = 101; Path = "/vdfs_rootfs_mnt/16815161-5473-7b62-b295-########/volumes/34aa6b03-badd-8e17-8a7c-########/default/2f7c6fb1-86c7-4bcd-8fcf-########/referrals/nfs-share"; Pseudo = "/vsanfs/nfs-share"; Access_Type = RO; FSAL { Name = VDFS; } Protocols = 4; SecType = sys,krb5,krb5i,krb5p;  }

  • One the configuration is modified, please restart vsanmgmtd service  on the host where the container/ fileserver show in a restarting state. Command to restart vsanmgmtd /etc/init.d/vsanmgmtd restart