Storage latency causes NSX T Manager cluster instability
search cancel

Storage latency causes NSX T Manager cluster instability


Article ID: 316654


Updated On:


VMware NSX Networking


- NSX Manager clusters services may be DOWN or unstable. Accessing UI may show error similar to "Component health: SEARCH:UP, POLICY:DOWN, MANAGER:DOWN, UI:UP, NODE_MGMT:UP."
- /var/log/cloudnet/nsx-ccp.log shows "Timeout while ping backend kv-store with upperBound 15s."
- Corfu is undergoing frequent epoch changes:
ls -ltrh /config/corfu | grep LAYOUT
- /var/log/syslog shows "CorfuDB is disconnected, set Cluster Status Down"
- /var/log/corfu/corfu-compactor-audit.log may contain "WARN CorfuRuntime-0 CorfuRuntime - Couldn't connect to any up-to-date layout servers, retrying in PT1S, Retried 0 times, systemDownHandlerTriggerLimit = 60"
- /var/log/corfu/corfu.9000.log may contain "WrongEpochException" messages


VMware NSX-T Data Center


High Manager storage latency causes cluster instability, induces frequent Corfu epoch changes, and other log messages above.
VM performance charts (On vSphere UI, select Manager VM > Monitor > Advanced > Set 'View' to 'Datastore', adjust time Period as needed) show Read and Write latency well above 10ms.

Example chart:


Either datastore read/write latency must be resolved, or Manager VMs can be Storage vMotioned to another datastore with latency under 10ms.

NSX-T Storage Requirement documentation states "The maximum disk access latency is under 10ms."