NSX Managers rebooting frequently and UI not accessible

search cancel

NSX Managers rebooting frequently and UI not accessible

book

Article ID: 414700

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

NSX Managers are rebooting frequently, or the NSX UI is inaccessible.
Within the UI, you may see the cluster status showing as being "DEGRADED", "UNAVAILABLE", or there may be no data visible at all.
You may see services flapping and CCP leadership changing frequently.
Corfu is undergoing frequent epoch changes:
ls -ltrh /config/corfu | grep LAYOUT
Pings between NSX managers look fine and no issues seen
High storage latency observed on datastores hosting NSX Manager VMs
Advanced perf charts show over 10ms latency spikes in real time

Environment

VMware NSX
VMware NSX-T Datacenter

Cause

High Manager storage latency causes cluster instability, induces frequent Corfu layout/epoch changes.

Resolution

Either datastore read/write latency must be resolved, or Manager VMs can be Storage vMotioned to another datastore with latency under 10ms.

Please reference below KB for detailed steps/logs to validate and confirm if you are facing storage latency issues affecting NSX manager stability.
Storage latency causes NSX Manager cluster instability

NSX Storage Requirement documentation states,

"NSX appliance VMs that are backed by VSAN clusters may see intermittent disk write latency spikes of 10+ms. This is expected due to the way VSAN handles data (burst of incoming IOs resulting in queuing of data and delay). As long as the average disk access latency continues to be less than 10ms, intermittent latency spike should not have an impact on NSX Appliance VMs."

NSX Manager VM and Host Transport Node System Requirements

Feel free to open a case with Broadcom Support Team if the issue persists even after resolving latency issues related to storage

Additional Information

Helpful KB's:

Troubleshooting NSX Datastore (CorfuDB) Issues
"performance has deteriorated" messages in ESXi host logs
Using esxtop to identify storage performance issues for ESXi (multiple versions)
"state in doubt; requested fast path state update" error in ESXi

Feedback

thumb_up Yes

thumb_down No