NSX-T Edge file system becomes Read-Only
search cancel

NSX-T Edge file system becomes Read-Only

book

Article ID: 395764

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX-T Edge node goes into a read-only mode due to underlying storage issues
  • Console errors can be observed on the edge node for filesystem errors
  • Below logs are observed on NSX UI once the underlying storage issue is fixed:
    2025-04-07T21:00:51.786Z
    nsx-edge NSX 75765 MONITORING [nsx@6876 alarmId="########-####-####-############" alarmState="OPEN" comp="nsx-manager" entId="########-####-####-############" errorCode="MP701099" eventFeatureName="edge_health" eventSev="CRITICAL" eventState="On" eventType="storage_error" level="FATAL" nodeId="########-####-####-############" subcomp="monitoring"] The following disk partitions on the Edge node are in read-only mode:
  • In some cases where the storage connectivity is completely lost, you may see the logging stopping at the storage impact timestamp.
    2025-04-07T18:01:50.170Z nsx-edge NSX 1841 - [nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="1988" level="INFO"] [GetPnicBondsGenericRuntime] Got 3 pnicbond entries
    ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ---> suggesting not able to write to storage

    Storage Error Alarm definition.

Environment

  • VMware NSX-T Data Center
  • VMware NSX

Cause

Underlying storage issues such as disconnect from storage array, disk drive failures, network interruption etc. can cause an EDGE to enter into a READ-ONLY mode to protect its data integrity from further impact. This is an expected behavior from any Linux based file system.

Related information: Linux based file systems become read-only

Resolution

To resolve the issue, one must first address the underlying storage connectivity issues and confirm that underlying storage is healthy and optimal before proceeding with any of these steps. 
NOTE: DO NOT PROCEED IF UNDERLYING STORAGE IS DEGRADED 

Related Information:

Typically if an EDGE enters into READ-ONLY Mode, SSH access to the EDGE may not be available and the only way to login would be via the CONSOLE SCREEN.Below steps will need to be followed for the recovery:

  1. Connect to the console of the appliance.
  2. Reboot the impacted edge node.
  3. When the GRUB boot menu appears, Press e to edit the menu: 
    NOTE: The GRUB menu doesn't appear by default. Check the box in Edit Settings -> VM Options -> Boot Option -> Force BIOS setup in vCenter UI for Edges.
  4. Password "VMware1" before release 3.2 and "NSX@VM!WaR10" 3.2 and beyond.
  5. Enter the user name ( root) and the GRUB password for root.
  6. Search for the line starting with linux.
  7. Remove all options after root=UUID=########-####-####-############ (Starting from the end of the UUID) and add "rw init=/bin/bash" after the UUID
  8. Press Ctrl-X to boot.
  9. When the log messages stop, press Enter.
  10. The prompt root@(none):/# will appear.
  11. Once the edge is accessible, recover the filesystem with below commands:
    e2fsck -y /dev/sda1

    e2fsck -y /dev/sda2
    e2fsck -y /dev/mapper/nsx-config
    e2fsck -y /dev/mapper/nsx-image
    e2fsck -y /dev/mapper/nsx-var+log
    e2fsck -y /dev/mapper/nsx-config__bak
    e2fsck -y /dev/mapper/nsx-tmp

Alternately, below steps can be followed for the recovery of the node:

  • Re-deploy the edge.
  • Reboot the edge

Additional Information

Once the edge node goes into a read-only mode, there's no easy way to know which processes running inside the Edge will get stuck due to I/O waits caused by loss of storage.
In some situations the edge may work, in some it won't. Datapath impacts can also be observed.
If the edge node reboot does not seem to be progressing, the reset option in vCenter may be used.  Bear in mind that any time a forcefully reset is performed on a VM there is a potential for data corruption.

-------------------------------------------

TECH DOCS 

Redeploy an NSX Edge Node