NSX Edge nodes experience unpredictable dataplane failure and loss of management connectivity when the underlying storage causes the root file-system to remount as read-only.
search cancel

NSX Edge nodes experience unpredictable dataplane failure and loss of management connectivity when the underlying storage causes the root file-system to remount as read-only.

book

Article ID: 434945

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The Edge node stops generating syslog or local logs entirely for an extended period.
  • This is followed at some point afterwards by a total loss of N-S traffic and tunnel connectivity with Host Transport Nodes.
  • After correcting the underlying storage fault and a forced reboot, the system kernel shows a "fsck.mode=force" state, and the lack of log entries between the storage failure and the reboot confirms a disk read-only condition.
  • A complete North-South traffic outage occurred as all Edge transport nodes became unresponsive or lost tunnel integrity. The outage may be delayed; the dataplane may remain partially functional for many days after the initial storage event before failing, making the root cause difficult to correlate.

Environment

VMware NSX

Cause

Underlying storage issue.

Resolution

Ensure the underlying storage infrastructure is stable and, if an Edge is suspected to be in a read-only state, perform a hard reboot/power-cycle of the Edge VM to trigger a file-system check (FSCK) and restore write capabilities.

 

Additional Information

System behaviour under read-only root file-system conditions is fundamentally unpredictable.