High latency and metadata overhead on global NFS datastores due to multiple stale PING-GID Folders from vSphere Replication
search cancel

High latency and metadata overhead on global NFS datastores due to multiple stale PING-GID Folders from vSphere Replication

book

Article ID: 424758

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  • High latency and significant metadata overhead is observed on shared NFS volumes mapped across multiple vCenter servers in diverse geographical locations

  • Accessing the volume from the local site (where the storage resides) is fast, but accessing it from remote global sites results in extremely slow folder/file listing.

  • It is observed that the impacted NFS datastore contains numerous empty folders named with the prefix PING-GID

  • Folders appear to be generated once daily and are not automatically removed, leading to hundreds of stale directories over time.

    As can be seen from the above screenshot, the files are all generated around the same time everyday.

Environment

vSphere Replication 9.x

Cause

In Enhanced Replication, the system uses "Enhanced Replication Mappings" to manage data movement. These mappings are created automatically when you set up new replications or upgrade your existing legacy ones.

To ensure your data stays protected, the appliance regularly performs "quick tests" to check the connection between your different sites. During these checks, the system creates temporary folders named PING-GID on your datastore to confirm it can successfully write and access data.

Under normal circumstances, these folders are deleted automatically as soon as the test finishes. However, if a replication mapping is in an Error State or if a test is interrupted, the cleanup process is skipped. This results in stale or leftover folders staying on your storage.

Cause Validation:

Navigate to the Site Recovery Client and inspect the Enhanced Replication Mappings. Confirm if any mappings are currently in an Error State.

Note: Even if the enhanced replication feature is not actively being used, an error status in these mappings will trigger the failed cleanup of daily PING-GID health check folders.

Review the Recent Tasks pane in the vSphere Client following the daily health check.

It is observed that no "Delete file" tasks are initiated, even after the enhanced replication mapping tests conclude.

In a healthy environment, the vSphere Replication Appliance should automatically trigger "Delete file" tasks to remove the PING-GID folders immediately after the connectivity tests complete. The absence of these deletion tasks confirms that the cleanup process is failing.

It is also observed that the time at which the health check is executed matches the exact creation time of the PING-GID folders on the NFS volume.

Resolution

In order to resolve this issue, identify and remove any Enhanced Replication Mappings that are in an error state, since the enhanced replication feature is not currently utilized for active replications.

Safely delete the existing stale PING-GID folders from the datastore to reduce metadata overhead.

Note that while clearing folders improves performance, some latency is inherent to Global Distance. Hosts in distant regions will naturally experience slower listing speeds than local hosts due to the round-trip time (RTT) required for each metadata request.