NSX Manager cluster degrades and multiple services fail randomly

search cancel

book

calendar_today

VMware NSX

The NSX Manager cluster enters a degraded state.
- Cluster services randomly fail or go down on the Manager nodes.
- The Alarms typically self-resolve after a short period.
Log in to the NSX Manager node via the CLI using PuTTY with the root account
The /var/log/kern.log file on one or more Manager nodes displays SCSI host driver task aborts similar to the following:
2025-11-14T06:14:19.658Z nsxmgr.cor.local kernel - - - [ 1591.362466] mptscsih: ioc0: attempting task abort! (sc=ffff88810efdd910)
2025-11-14T06:14:19.682Z nsxmgr.cor.local kernel - - - [ 1591.766465] mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88810efdd910)
The /var/log/corfu/corfu.9000.log file on the Manager nodes displays stream log errors similar to the following:
2025-11-14T06:14:21.260Z | ERROR | LogUnit-BatchProcessor-0 | o.c.i.BatchProcessor | batchWriteProcessor: stream log error. Batch: [queue size=7]. StreamLog: [trim mark=51387158].
2025-11-14T06:14:21.260Z | ERROR | LogUnit-BatchProcessor-0 | o.c.i.BatchProcessor | batchWriteProcessor: stream log error. Batch: [queue size=6]. StreamLog: [trim mark=51387158].
The /var/log/proton/proton_restart.log has logs similar to below
2025-11-14T06:14:22.350Z INFO application-restartor restartor 52576 - [nsx@4413 comp="nsx-manager" level="INFO" subcomp="manager"] ===== APPLICATION IS GOING RESTART (GMLE leadership safety violation handler triggered for groupType: mp) =====

VMware NSX

High write latencies in the Corfu database, caused by underlying host or storage performance issues, lead to overall cluster instability.

To resolve this issue, you must relieve the storage latency affecting the impacted NSX Manager node.

Identify the specific impacted NSX Manager node where the SCSI Host Driver issues(mptscsih) are seen is impacted.
Migrate the impacted NSX Manager virtual machine (via vSphere vMotion/Storage vMotion) to a different ESXi host or Datastore that has healthy storage performance metrics.

thumb_up Yes

thumb_down No