Storage latency causes NSX T Manager cluster instability
search cancel

Storage latency causes NSX T Manager cluster instability

book

Article ID: 316654

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
- NSX Manager clusters services may be DOWN or unstable. Accessing UI may show error similar to "Component health: SEARCH:UP, POLICY:DOWN, MANAGER:DOWN, UI:UP, NODE_MGMT:UP."
 
- /var/log/cloudnet/nsx-ccp.log shows "Timeout while ping backend kv-store with upperBound 15s."
 
- Corfu is undergoing frequent epoch changes:
ls -ltrh /config/corfu | grep LAYOUT
 
- /var/log/syslog shows "CorfuDB is disconnected, set Cluster Status Down"
 
- /var/log/corfu/corfu-compactor-audit.log may contain "WARN CorfuRuntime-0 CorfuRuntime - Couldn't connect to any up-to-date layout servers, retrying in PT1S, Retried 0 times, systemDownHandlerTriggerLimit = 60"
 
- /var/log/corfu/corfu.9000.log may contain "WrongEpochException" messages

- High r_await or w_await numbers for nsx_config in /var/log/stats/sys_io.stats

Device            r/s     rkB/s   rrqm/s  %rrqm r_await  rareq-sz     w/s     wkB/s   wrqm/s  %wrqm  w_await  wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
nsx-config      0.00   0.00    0.00      0.00    2.96      12.18         0.00    0.00     0.00        0.00      452.14     4.00           0.00     8.41     0.00       0.00    0.28    31699.12    0.00    0.00

Environment

VMware NSX-T Data Center

Cause

High Manager storage latency causes cluster instability, induces frequent Corfu epoch changes, and other log messages above.
 
VM performance charts (On vSphere UI, select Manager VM > Monitor > Advanced > Set 'View' to 'Datastore', adjust time Period as needed) show Read and Write latency well above 10ms.

Example chart:

Resolution

Either datastore read/write latency must be resolved, or Manager VMs can be Storage vMotioned to another datastore with latency under 10ms.

NSX-T Storage Requirement documentation states "The maximum disk access latency is under 10ms."
https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.1/installation/GUID-AECA2EE0-90FC-48C4-8EDB-66517ACFE415.html