SSH into NSX Manager or through the console of the Manager VM through VCenter, you see multiple I/O log lines as shown below:
You may notice VIP keeps getting assigned to different managers.
get cluster status returns STABLE for cluster
NSX Manager uptimes may be high over 100+ days.
VMware NSX 4.1.2.3
The underlying storage may have unrecoverable read/write errors, which the Linux kernel reports as I/O errors.
The issue could have occurred due to file system corruption and that could have happened when storage issue was observed on the setup.
After the storage issues were resolved, the manager appliance OS might choose to mount the file-system in read-only mode as the file system could have been corrupted.
Reboot the NSX Manager VM: If the underlying storage issue was temporary (e.g., a brief network interruption), a simple reboot of the NSX Manager VM may allow the filesystem to correct itself and remount the partition as read/write. If the issue affects an entire cluster, reboot them one at a time.
fsck) for Read-Only Partitions
If the reboot fails and the console shows that a partition (like /) is mounted as read-only, a filesystem check is required.
Reboot the NSX Manager.
Access GRUB Menu: During the boot process, press the Shift or Esc key repeatedly to enter the GRUB menu.
Edit Boot Parameters: Select the Ubuntu/NSX Manager boot option and press 'e' to edit the command line.
Append fsck Commands: Navigate to the line starting with linux and append the following parameters at the end:
fsck.mode=force fsck.repair=yes
Boot the Appliance: Press F10 or Ctrl+X to boot with the modified parameters. This forces the OS to run an automated filesystem check and attempt to repair any corruption.
Verify Status: After booting, log in and verify that the filesystem is no longer read-only and all NSX services are running (get services).
If the fsck process is unsuccessful or the filesystem corruption is too severe, the affected NSX Manager node needs to be replaced.
Delete the Faulted Node: Remove the faulted appliance from the NSX cluster.
Deploy a New Node: Deploy a new NSX Manager appliance with the same FQDN and IP address to replace the failed member. The new node will synchronize its data from the remaining healthy nodes.