Handling of vSAN File Services VM failures
search cancel

Handling of vSAN File Services VM failures

book

Article ID: 427521

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article discusses the following: 

  • How do client VMs maintain connectivity to a vSAN File Share if a File Service VM (FSVM) fails?
  • Temporary loss of access or "Server not responding" errors when an ESXi host running vSAN File Services is rebooted or fails.
  • Understanding the difference in failover behavior between NFS 3, NFS 4.1, and SMB protocols during an outage.

Environment

  • VMware vSAN 7.x
  • VMware vSAN 8.x

Resolution

  • The exact behavior is slightly different depending on which protocol is being used:

    • For NFS 3 shares, there is no "referral" mechanism, unlike NFS 4.1 and SMB. The file shares must be given specific IPs that belong to a specific File Services VM (FSVM). In the event of a failure of that node, the FSVM is restarted on another host using the same IP. The clients may experience a timeout if using a soft mount (typical default is 60 seconds), which may require remounting manually from the guest or will retry indefinitely (hard mount), so should become available again once the FSVM is restarted.

    • For NFS 4.1/SMB shares, if there is a failure of an FSVM or the ESXi the FSVM runs on, the client can reference an NFS attribute containing the list of alternate IP addresses and will then try to connect to one of the other FSVMs. The failed FSVM will still be restarted, but this allows more immediate failover.


  • In all cases, the FSVMs should automatically be load balanced (every 30 minutes), so in the case of a failed ESXi host, one of the FSVMs will be migrated back to that host once it is back in service.
 

Additional Information