Linux VMs flags their file-system in read-only after vSAN cluster partitioning or datastore inaccessibility
search cancel

Linux VMs flags their file-system in read-only after vSAN cluster partitioning or datastore inaccessibility

book

Article ID: 400167

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

 

  • Network switch outage caused vSAN cluster partitioning.

  • All hosts report uplink (used by vSAN vmkernel) down issue in /var/run/log/vobd.log:

    YYYY-MM-DDTHH:MM:SS: [netCorrelator] 26763954042466us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic1 is down. Affected dvPort: 22/50 00 4f d2 a4 05 f6 ee-0b 46 d6 4a 7f 70 96 38. 1 uplinks up. Failed criteria: 128

    YYYY-MM-DDTHH:MM:SS: [netCorrelator] 26763954058362us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic4 is down. Affected dvPort: 7/50 00 4f d2 a4 05 f6 ee-0b 46 d6 4a 7f 70 96 38. 1 uplinks up. Failed criteria: 128 

  • File-system on few Linux VMs is set to read-only state post-restoration of the network issue.

  • Namespace objects of the impacted virtual machines report time-out errors  in /var/run/log/vobd.log

    YYYY-MM-DDTHH:MM:SS: [vmfsCorrelator] 26764193007100us: [esx.problem.vmfs.heartbeat.timedout] ########-########-####-#########6ac 54#####-########-####-#########6ac

    YYYY-MM-DDTHH:MM:SS: [vmfsCorrelator] 26764193007176us: [esx.problem.vmfs.heartbeat.timedout] ########-########-####-#########6ac 67#####-########-####-#########6ac

    YYYY-MM-DDTHH:MM:SS: [vmfsCorrelator] 26764193007212us: [esx.problem.vmfs.heartbeat.timedout] ########-########-####-#########6ac 31#####-########-####-#########6ac

  • Following is the vmware.log of an affected VM showing vmware tools timeout and error writing log:

    YYYY-MM-DDTHH:MM:SS No(00) vmx - >>> Error writing log, 178 bytes discarded. Disk full?
    YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Tools: Tools heartbeat timeout.
    YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Tools: Running status rpc handler: 1 => 0.
    YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Tools: Changing running status: 1 => 0.
    YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Tools: [RunningStatus] Last heartbeat value 17560572 (last received 21s ago)
    YYYY-MM-DDTHH:MM:SS No(00) vcpu-7 - >>> Error writing log, 87 bytes discarded. Disk full?
    YYYY-MM-DDTHH:MM:SS No(00) vmx - >>> Error writing log, 106 bytes discarded. Disk full?
    YYYY-MM-DDTHH:MM:SS In(05) vmx - GuestRpc: GuestRpcResetVsockChannel: channel 1
    YYYY-MM-DDTHH:MM:SS In(05) vmx - GuestRpc: Closing channel 1 connection 3 

Environment

vSphere vSAN 7.x

vSphere vSAN 8.x

Cause

Due to unavailability of the backend storage (i.e. vSAN objects) that has caused due to cluster partitioning, the affected linux VMs change the state of its filesystem to read- to prevent any sort of filesystem corruption.

Resolution