NSX Manager Cluster is Degraded and multiple services go down randomly
search cancel

NSX Manager Cluster is Degraded and multiple services go down randomly

book

Article ID: 419529

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX managers Cluster goes into degraded state 
  • Cluster services randomly go down on the manager nodes
  • /var/log/kern.log on one or more managers have the logs similar to below
    2025-11-14T06:14:19.658Z nsxmgr.cor.local kernel - - - [ 1591.362466] mptscsih: ioc0: attempting task abort! (sc=ffff88810efdd910)
    2025-11-14T06:14:19.682Z nsxmgr.cor.local kernel - - - [ 1591.766465] mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88810efdd910)
  • /var/log/corfu.9000.log on the managers have logs similar to below
    2025-11-14T06:14:21.260Z | ERROR |       LogUnit-BatchProcessor-0 |           o.c.i.BatchProcessor | batchWriteProcessor: stream log error. Batch: [queue size=7]. StreamLog: [trim mark=51387158].
    2025-11-14T06:14:21.260Z | ERROR |       LogUnit-BatchProcessor-0 |           o.c.i.BatchProcessor | batchWriteProcessor: stream log error. Batch: [queue size=6]. StreamLog: [trim mark=51387158].
  • /var/log/proton/proton_restart.log has logs similar to below
    2025-11-14T06:14:22.350Z  INFO application-restartor restartor 52576 - [nsx@4413 comp="nsx-manager" level="INFO" subcomp="manager"] ===== APPLICATION IS GOING RESTART (GMLE leadership safety violation handler triggered for groupType: mp) =====

Environment

VMware NSX

Cause

Corfu write latencies owing to underlay storage issue cause overall cluster instability.

Resolution

The NSX manager node where the SCSI Host Driver issues(mptscsih) are seen is impacted. Move the impacted node out of the host/storage where it is and monitor/ resolve the underlay storage issues.