Application using vSAN NFS share may start up slowly with specific actions
search cancel

Application using vSAN NFS share may start up slowly with specific actions

book

Article ID: 395131

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

An application using a vSAN NFS share may experience extremely slow startup times or performance degradation. This typically occurs when the application resides on or accesses a share containing a large number of files and performs operations such as ACCESS, GETATTR, or READDIR.

Symptoms

  • Extreme delays when listing files in directories (e.g., du -sh or ls commands taking over 15 minutes).

  • Slow application startup times when the application resides on or accesses a vSAN NFS share.

  • Latency observed during basic file operations such as RENAME or SAVEFH.

  • The issue is most prominent in directories containing a large number of files (e.g., 500,000+ files).

Environment

VMware vSAN 8.x 

VMware vSAN OSA

Cause

This behavior is a known product limitation in vSAN File Services 8.x.

  • Protocol Limitation: vSAN 8.x utilizes the 9P protocol for internal file service communication. This protocol has architectural bottlenecks regarding metadata processing.

  • READDIR Bottleneck: The processing of READDIR calls is limited (typically responding with a maximum of 50 entries per client call), leading to multiple round-trips to the vSAN Distributed File System (VDFS), which manifests as high latency.

  • Metadata Latency: Operations such as RENAME and SAVEFH can show average latencies of ~18ms with peaks up to 1 second under high load.

Justification

  • The READDIR Bottleneck -
    • Average latency: ~51.8 ms
    • Maximum latency: ~358 ms

  • Metadata Operation Latency:
    • Operations such as RENAME and SAVEFH show average latency of ~18 ms, with peaks up to ~1 second

High latency may be consistently observed across metadata-intensive operations, which will directly impact file listing and directory traversal.

  • Evidence - 

NFSv4 Service Response Time Statistics - slow_nfs.pcap:
Index  Procedure  Calls  Min SRT (s)  Max SRT (s)  Avg SRT (s)  Sum SRT (s)
---------------------------------------------------------------------------
NFSv4 Operations

READDIR       26   5857     0.000064     0.358828     0.051859   303.736266
RENAME        29   2134     0.002340     1.009166     0.018448    39.368048
SAVEFH        32   2134     0.002340     1.009166     0.018448    39.368048
WRITE         38   2729     0.001004     0.166386     0.004284    11.691867

NFSv4 Main Operation
READDIR       26   5857     0.000064     0.358828     0.051859   303.736266
RENAME        29   2134     0.002340     1.009166     0.018448    39.368048
WRITE         38   2729     0.001004     0.166386     0.004284    11.691867

Resolution

  • Readdir performance is significantly improved in 10P, implemented in 9.1
  • The number of entries issue is addressed as part of Ganesha upgrade and would be implemented in future 9.1 release. The number of entries would be function of the buffer advertised by the client in READDIR call.

Additional Information

  • While version 9.0 (10P) introduces notable improvements compared to the 8.x branch, however certain scenarios may still require further optimization.
  • Looking ahead, version 9.1 will include an upgraded Ganesha component, which is expected to provide additional improvements in this area. That said, some READDIR-related behaviors may continue to be refined beyond this release.