High application latency in Guest OS but no storage latency shown in environment telemetry

search cancel

book

calendar_today

VMware vSAN VMware Cloud Foundation VMware Telco Cloud Platform

An application performance degradation event impacting critical Guest OS services (including Aerospike, ckey-mariaDB, and core application engines), which caused dropped success rates across communication paths.
Application-layer monitoring dashboards (Grafana) recorded high write latency spikes on multiple nodes.
Infrastructure-level telemetry does not reflect any corresponding storage latency spikes or backend congestion during the reported event timeframe.

ESXi: 7.0.3 EP13

VCF: 4.5.2

TCP: 2.7

Discrepancy in metric aggregation where application-layer tracking calculates cumulative processing time (including internal application processing threads, Guest OS kernel queuing, and scheduling delays) rather than the actual physical storage subsystem I/O round-trip time.

Validate infrastructure-layer storage health by reviewing vSphere Performance Charts or VMware Aria Operations metrics for the specific timestamp of the reported latency event.
Confirm that backend vSAN write latencies remain within normal operational baselines (typically <7ms).
Analyze Guest OS internal statistics to verify if disk I/O wait thresholds inside the virtual machine nodes exceeded the expected baseline
Review the configuration and metric gathering mechanism of the application-level monitoring tool to verify how latency is computed at the user space tier versus the kernel level.
If infrastructure performance data confirms storage latency remained within nominal limits engage the application vendor to isolate Guest OS thread scheduling anomalies.

thumb_up Yes

thumb_down No