Linux VMs flags their file-system in read-only after vSAN cluster partitioning or datastore inaccessibility

search cancel

Linux VMs flags their file-system in read-only after vSAN cluster partitioning or datastore inaccessibility

book

Article ID: 400167

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

Network switch outage caused vSAN cluster partitioning.
All hosts report uplink (used by vSAN vmkernel) down issue in /var/run/log/vobd.log:

YYYY-MM-DDTHH:MM:SS: [netCorrelator] 26763954042466us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic1 is down. Affected dvPort: 22/50 00 4f d2 a4 05 f6 ee-0b 46 d6 4a 7f 70 96 38. 1 uplinks up. Failed criteria: 128

YYYY-MM-DDTHH:MM:SS: [netCorrelator] 26763954058362us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic4 is down. Affected dvPort: 7/50 00 4f d2 a4 05 f6 ee-0b 46 d6 4a 7f 70 96 38. 1 uplinks up. Failed criteria: 128
File-system on few Linux VMs is set to read-only state post-restoration of the network issue.
Namespace objects of the impacted virtual machines report time-out errors in /var/run/log/vobd.log:

YYYY-MM-DDTHH:MM:SS: [vmfsCorrelator] 26764193007100us: [esx.problem.vmfs.heartbeat.timedout] ########-########-####-#########6ac 54#####-########-####-#########6ac

YYYY-MM-DDTHH:MM:SS: [vmfsCorrelator] 26764193007176us: [esx.problem.vmfs.heartbeat.timedout] ########-########-####-#########6ac 67#####-########-####-#########6ac

YYYY-MM-DDTHH:MM:SS: [vmfsCorrelator] 26764193007212us: [esx.problem.vmfs.heartbeat.timedout] ########-########-####-#########6ac 31#####-########-####-#########6ac
Following is the vmware.log of an affected VM showing vmware tools timeout and error writing log:

YYYY-MM-DDTHH:MM:SS No(00) vmx - >>> Error writing log, 178 bytes discarded. Disk full?
YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Tools: Tools heartbeat timeout.
YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Tools: Running status rpc handler: 1 => 0.
YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Tools: Changing running status: 1 => 0.
YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Tools: [RunningStatus] Last heartbeat value 17560572 (last received 21s ago)
YYYY-MM-DDTHH:MM:SS No(00) vcpu-7 - >>> Error writing log, 87 bytes discarded. Disk full?
YYYY-MM-DDTHH:MM:SS No(00) vmx - >>> Error writing log, 106 bytes discarded. Disk full?
YYYY-MM-DDTHH:MM:SS In(05) vmx - GuestRpc: GuestRpcResetVsockChannel: channel 1
YYYY-MM-DDTHH:MM:SS In(05) vmx - GuestRpc: Closing channel 1 connection 3

Environment

vSphere vSAN 7.x

vSphere vSAN 8.x

Cause

Due to unavailability of the backend storage (i.e. vSAN objects) that has caused due to cluster partitioning, the affected linux VMs change the state of its filesystem to read- to prevent any sort of filesystem corruption.

Resolution

After recovery of the network connectivity and restoring of vSAN datastore to functional state, reboot of the Linux VMs to restore original state of filesystem from read-only.
If reboot does not fix, mount the filesystem manually using command:
# mount -o remount /

Refer following document for details on why Linux VMs file system go read-only :
https://knowledge.broadcom.com/external/article/343025/linux-based-file-systems-become-readonly.html

Feedback

thumb_up Yes

thumb_down No