"Host Connectivity Degraded in ESXi" warning in vCenter Server
search cancel

"Host Connectivity Degraded in ESXi" warning in vCenter Server

book

Article ID: 318957

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Summary

This event indicates that the ESXi host's connectivity to the volume (for which this event was generated) degraded due to the inability of the host to renew its heartbeat for period of approximately 16 seconds (the VMFS lock breaking lease timeout).

After the periodic heartbeat renewal fails, VMFS declares that the heartbeat to the volume has timed out and suspends all I/O activity on the device until connectivity is restored or the device is declared inoperable.

There are two components to this:
  • Heartbeat Interval = 3 seconds
  • Heartbeat lease wait timeout = 16 seconds

A host indicates its liveness by periodically (every 3 seconds) performing I/O to its heartbeat on a given volume.

Therefore, if no activity is seen on the host's heartbeat slot for a period of time, then we can conclude that the host has lost connectivity to the volume.

This wait time is a little over 5 heartbeat intervals or 16 seconds to be precise.

 

Example

If an ESXi host has mounted a volume san-lun-100 from device naa.60060160b4111600826120bae2e3dd11:1 and loses connectivity (due to a cable pull, disk array failure, and so on) to the device for a period exceeding 16 seconds, the following error message appears:

Lost access to volume 496befed-1c79c817-6beb-001ec9b60619 (san-lun-100) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
 

 

Impact

All I/O, metadata operations to the specific volume from COS, user interface (vSphere Client), or virtual machines are internally queued and retried for some duration of time.

If the volume or storage device connectivity is not restored within that duration of time, such I/O operations fail.

This might have an impact on already running virtual machines as well as any new power on operations by virtual machines.

Environment

VMware vSphere ESXi 8.0

VMware vSphere ESXi 7.0.0

VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.0

VMware vSphere ESXi 5.1

VMware vSphere ESXi 5.0

VMware ESX 4.1.x
VMware ESXi 4.1.x Embedded
VMware ESXi 4.1.x Installable

VMware ESX 4.0.x
VMware ESXi 4.0.x Embedded
VMware ESXi 4.0.x Installable

VMware ESX Server 3.5.x
VMware ESXi 3.5.x Embedded
VMware ESXi 3.5.x Installable

 

VMware vCenter Server 8.0

VMware vCenter Server 7.0.x

VMware vCenter Server 6.7.x
VMware vCenter Server 6.5.x
VMware vCenter Server 6.0.x

VMware vCenter Server 5.5.x
VMware vCenter Server 5.1.x
VMware vCenter Server 5.0.x

VMware vCenter Server 4.1.x
VMware vCenter Server 4.0.x

 

Cause

This issue can be caused by a number of underlying storage issues.

Resolution

To resolve this issue investigate underlying storage issues using the vSphere Client and ESXi command line.

 

vSphere Client

  1. Connect to the vCenter Server using vSphere Client.
  2. Select the Storage View tab to map the HBA (Host Bus Adapter) associated to the affected VMFS volume.
  3. Identify and resolve the path inconsistencies to the LUN. For more information, see Troubleshooting fibre channel storage connectivity and Troubleshooting ESXi connectivity to iSCSI arrays using software initiators.

    Note: If connections are restored, VMFS automatically recovers the heartbeat on the volume and continues the operation.

 

ESXi command line

  1. Connect to the ESXi host using an SSH session.
  2. Run these commands:
    1. Query VMFS datastore properties using the vmkfstools command.
For example:
vmkfstools –P san-lun-100
File system label (if any): san-lun-100
Mode: public
Capacity 80262201344 (76544 file blocks * 1048576), 36768317440 (35065 blocks) avail
UUID: 49767b15-1f252bd1-1e57-00215aaf0626
Partitions spanned (on "lvm"): naa.60060160b4111600826120bae2e3dd11:1
  1. Use esxcfg-mpath along with the naa ID of the LUN (Logical Unit Number) output from the above command to identify the state of all the paths to affected LUN.
For example:
esxcfg-mpath -b -d naa.60060160b4111600826120bae2e3dd11
naa.60060160b4111600826120bae2e3dd11 : DGC Fibre Channel Disk (naa.60060160b4111600826120bae2e3dd11) vmhba0:C0:T0:L0 LUN:0 state:active fc Adapter:
WWNN: 20:00:00:00:c9:7d:6c:e0 WWPN: 10:00:00:00:c9:7d:6c:e0 Target: WWNN: 50:06:01:60:b0:22:1f:dd WWPN: 50:06:01:60:30:22:1f:dd vmhba0:C0:T1:L0 LUN:0 state:standby fc Adapter:
WWNN: 20:00:00:00:c9:7d:6c:e0 WWPN: 10:00:00:00:c9:7d:6c:e0 Target: WWNN: 50:06:01:60:b0:22:1f:dd WWPN: 50:06:01:68:30:22:1f:dd
  1. Follow the steps provided in Troubleshooting fibre channel storage connectivity and Troubleshooting ESXi connectivity to iSCSI arrays using software initiators to identify and resolve the path inconsistencies to the LUN.
  2. If connections are restored, VMFS automatically recovers the heartbeat on the volume and continues the operation.

Note: For additional information, see Troubleshooting LUN connectivity issues on ESXi hosts.

Additional Information

This issue is being checked by Diagnostics for VMware Cloud Foundation.

The check is as follows:

  • Product: ESXi
  • Log File: vobd.log
  • Log Expression Check "vmfsCorrelator" AND "esx.problem.vmfs.heartbeat.timedout"