NSX-T Edge file system becomes Read-Only
search cancel

NSX-T Edge file system becomes Read-Only

book

Article ID: 395764

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX-T Edge node goes into a read-only mode due to underlying storage issues

  • Console errors can be observed on the edge node for filesystem errors

  • Below logs are observed on NSX UI once the underlying storage issue is fixed:
    ####-##-##T##:##:##.#### nsx-edge NSX 75765 MONITORING [nsx@6876 alarmId="########-####-####-############" alarmState="OPEN" comp="nsx-manager" entId="########-####-####-############" errorCode="MP701099" eventFeatureName="edge_health" eventSev="CRITICAL" eventState="On" eventType="storage_error" level="FATAL" nodeId="########-####-####-############" subcomp="monitoring"] The following disk partitions on the Edge node are in read-only mode:
  • In some cases where the storage connectivity is completely lost, you may see the logging stopping at the storage impact timestamp.

    ####-##-##T##:##:##.#### nsx-edge NSX 1841 - [nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="1988" level="INFO"] [GetPnicBondsGenericRuntime] Got 3 pnicbond entries

    ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ---> suggesting not able to write to storage

    Storage Error Alarm definition.

Environment

  • VMware NSX-T Data Center
  • VMware NSX

Cause

Underlying storage issues such as disconnect from storage array, disk drive failures, network interruption etc. can cause an EDGE to enter into a READ-ONLY mode to protect its data integrity from further impact. This is an expected behavior from any Linux based file system.

Related information: Linux based file systems become read-only

Resolution

To resolve the issue, one must first address the underlying storage connectivity issues and confirm that underlying storage is healthy and optimal before proceeding with any of these steps. 
NOTE: DO NOT PROCEED IF UNDERLYING STORAGE IS DEGRADED 

Related Information:

Typically if an EDGE enters into READ-ONLY Mode, SSH access to the EDGE may not be available and the only way to login would be via the CONSOLE SCREEN.Below steps will need to be followed for the recovery:

 

To resolve the issue perform below actions 

Reboot NSX Edge and check if that resolves the issue and NSX Edge becomes accessible and disk partition error on NSX Edge gets resolved. 

If Reboot does not resolve the issue then perform below actions 

 

Method 1

  1. Connect to the console of NSX Edge appliance from vcenter
  2. Reboot the appliance
  3. When the GRUB menu appears, press the left SHIFT or ESC key quickly. Note: If you wait too long and the boot sequence does not pause, you must reboot the system again
  4. Keep the cursor on the Ubuntu selection and enter a lower case 'e' to enter the grub login. 
  5. Enter the user name (root) and the GRUB password for root (not the same as the appliance user root). Note: The default password is 'VMware1' before release 3.2 and 'NSX@VM!WaR10' for 3.2 and beyond.
  6. Search for the line starting with linux. At the end of this line, add fsck.mode=force fsck.repair=yes and press F10 or Ctrl-X to boot the appliance. This should bring up the prompt to login with the appliance root user credentials.

Method 2 

  1. Connect to the console of the appliance.
  2. Reboot the impacted edge node.
  3. When the GRUB boot menu appears, Press e to edit the menu: 
                NOTE: The GRUB menu doesn't appear by default. Check the box in Edit Settings -> VM Options -> Boot Option -> Force BIOS setup in vCenter UI for Edges.
  4. Password "VMware1" before release 3.2 and "NSX@VM!WaR10" 3.2 and beyond.
  5. Enter the user name ( root) and the GRUB password for root.
  6. Search for the line starting with linux.
  7. Remove all options after root=UUID=########-####-####-############ (Starting from the end of the UUID) and add "rw init=/bin/bash" after the UUID
  8. Press Ctrl-X to boot.
  9. When the log messages stop, press Enter.
  10. The prompt root@(none):/# will appear.
  11. Once the edge is accessible, recover the filesystem with below commands:
    e2fsck -y /dev/sda1

    e2fsck -y /dev/sda2
    e2fsck -y /dev/mapper/nsx-config
    e2fsck -y /dev/mapper/nsx-image
    e2fsck -y /dev/mapper/nsx-var+log
    e2fsck -y /dev/mapper/nsx-config__bak
    e2fsck -y /dev/mapper/nsx-tmp

Alternately, below steps can be followed for the recovery of the node:

  • Re-deploy the edge.
  • Reboot the edge

Additional Information

Once the edge node goes into a read-only mode, there's no easy way to know which processes running inside the Edge will get stuck due to I/O waits caused by loss of storage.
In some situations the edge may work, in some it won't. Datapath impacts can also be observed.
If the edge node reboot does not seem to be progressing, the reset option in vCenter may be used.  Bear in mind that any time a forcefully reset is performed on a VM there is a potential for data corruption.

-------------------------------------------

TECH DOCS 

Redeploy an NSX Edge Node