High application latency in Guest OS but no storage latency shown in environment telemetry
search cancel

High application latency in Guest OS but no storage latency shown in environment telemetry

book

Article ID: 441320

calendar_today

Updated On:

Products

VMware vSAN VMware Cloud Foundation VMware Telco Cloud Platform

Issue/Introduction

  • An application performance degradation event impacting critical Guest OS services (including Aerospike, ckey-mariaDB, and core application engines), which caused dropped success rates across communication paths.
  • Application-layer monitoring dashboards (Grafana) recorded high write latency spikes on multiple nodes.
  • Infrastructure-level telemetry does not reflect any corresponding storage latency spikes or backend congestion during the reported event timeframe.

Environment

ESXi: 7.0.3 EP13

VCF: 4.5.2

TCP: 2.7

Cause

  • Discrepancy in metric aggregation where application-layer tracking calculates cumulative processing time (including internal application processing threads, Guest OS kernel queuing, and scheduling delays) rather than the actual physical storage subsystem I/O round-trip time.

Resolution

 

  1. Validate infrastructure-layer storage health by reviewing vSphere Performance Charts or VMware Aria Operations metrics for the specific timestamp of the reported latency event.

  2. Confirm that backend vSAN write latencies remain within normal operational baselines (typically <7ms).

  3. Analyze Guest OS internal statistics to verify if disk I/O wait thresholds inside the virtual machine nodes exceeded the expected baseline 

  4. Review the configuration and metric gathering mechanism of the application-level monitoring tool to verify how latency is computed at the user space tier versus the kernel level.

  5. If infrastructure performance data confirms storage latency remained within nominal limits engage the application vendor to isolate Guest OS thread scheduling anomalies.