One Host In vSAN Compute Only Cluster Unable to Use vSAN Storage
search cancel

One Host In vSAN Compute Only Cluster Unable to Use vSAN Storage

book

Article ID: 382048

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
Running compute only cluster mounting vSAN datastore
VMs on a specific host show inaccessible, potentially appear running and become inaccessible when powered off
VMs fail vMotion to other hosts with timeouts and error

Failed waiting for data. Error 195887167. Connection closed by remote host, possibly due to timeout. 2024-11-13T23:11:09.50087Z VMotionStream [175250081:858948631672480427] failed to lookup swapfile: Failure. Migration to host <##.###.##.###> failed with error Failure (195887105). 2024-11-13T23:11:09.527196Z vMotion migration [175250081:858948631672480427] failed to read stream keepalive: Connection closed by remote host, possibly due to timeout

There are no network partition alerts in Skyline Health
The host can see the vSAN datastore
Running ls or attempting to enter a namespace in the datastore returns an IO error
vSAN datastore is functioning normally on other hosts, VMs powered off and removed from inventory from problem host are able to be registered and powered on from other hosts
vmkping on the vSAN tagged vmkernel port on the problem host cannot reach the other hosts vSAN IPs

Impact:
Virtual machines fail to run on the impacted host

Identification:
vobd log shows vmnic come online and vSAN namespace object heartbeats begin failing. Once offline the heartbeats recover.
Example messages - times and details including actual vmnic will vary:

2024-11-12T15:43:03.027Z: [netCorrelator] 133094369261us: [vob.net.dvport.uplink.transition.up] Uplink: vmnic0 is up. Affected dvPort: 4436/50 0c 2e 4f f4 82 1d 8f-c3 fc 56 20 3b 2f ea c4. 2 uplinks up
...
2024-11-12T15:43:13.404Z: [vmfsCorrelator] 1330954078562us: [vob.vmfs.heartbeat.timedout] 628b2dc6-########-####-############ c62f8f62-####-####-####-############
....
2024-11-12T17:02:03.034Z: [netCorrelator] 133094369261us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic0 is down. Affected dvPort: 4436/50 0c 2e 4f f4 82 1d 8f-c3 fc 56 20 3b 2f ea c4. 1 uplinks up. Failed Criteria: 128
....
2024-11-12T17:02:04.458Z: [vmfsCorrelator] 1330954078562us: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume 628b2dc6-########-####-############ (c62f8f62-####-####-####-############): [Timeout]


Environment

vSAN mounted on a compute only cluster

Cause

A vmnic backing the vSAN vmk is failing to reach the vSAN network, even if others configured to back the same vmk can reach the network.

Resolution

Offline or remove the problem vmnic from use on the vSAN vmkernel port or host.
Recommended that customer work with their network team to find cause of network adapter being unable to reach the correct network.