SNMP Walk Fails with Timeout Error due to underlying storage LUN issue on the ESXi host
search cancel

SNMP Walk Fails with Timeout Error due to underlying storage LUN issue on the ESXi host

book

Article ID: 435232

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • When SNMPWALK command is run from the SNMP server to the ESXi host, it fails with the timeout error as shown below
  • The SNMPWALK query typically proceeds normally until it reaches storage-related parameters, such as hrDeviceErrors, at which point the process hangs and eventually fails to receive a response from the host.
snmpwalk -v2c -c dem1god <ESXi-Host-IP-Address>

HOST-RESOURCES-MIB::hrDeviceErrors.220 = Counter32: 0
HOST-RESOURCES-MIB::hrDeviceErrors.221 = Counter32: 0
HOST-RESOURCES-MIB::hrDeviceErrors.222 = Counter32: 0
HOST-RESOURCES-MIB::hrDeviceErrors.223 = Counter32: 0
Timeout: No Response from <ESXi-Host-IP-Address>

Environment

VMware ESXi

Cause

  • This failure occurs when underlying storage LUN issues, trigger a Permanent Device Loss (PDL) state across one or more devices on the ESXi host.
  • During a standard inventory of hardware resources, the SNMP agent queries all registered storage devices. When it encounters a LUN in a PDL state, the storage stack cannot provide a valid response.
  • This causes the SNMP agent to hang indefinitely while waiting for data from the dead path, ultimately resulting in an SNMP session timeout and a failure to populate the remaining MIB (Management Information Base) data.

Resolution

To resolve the SNMP timeout, the non-responsive LUNs must be addressed or removed from the host's management stack.

  1. Review ESXi host logs to identify the specific NAA ID reporting PDL errors. Refer to the KB article to know more about PDL errors Permanent Device Loss (PDL) and All-Paths-Down (APD) on host

  2. Contact your storage vendor or fabric switch administration team to decommission the faulty LUN or RAID controller at the hardware/fabric level.

  3. Reboot the ESXi host. This often clears the PDL state for transient issues and refreshes the device list.

  4. If a reboot is not immediate or the issue persists for a specific LUN, manually detach and turn off the problematic LUN via the ESXi CLI to prevent the SNMP agent from attempting to poll it. Refer to the KB article to know more about LUN detachment Detach a LUN device from ESXi hosts 

Additional Information

Refer the articles below to know more about configuration of SNMP client on the ESXi host 

Configuring SNMPv3 inform remote users in the ESXi SNMP agent

Debugging with the esxcli system snmp test command