SNMP monitoring of VMware ESXi hosts running MSCS virtual machines fail with timeout errors
search cancel

SNMP monitoring of VMware ESXi hosts running MSCS virtual machines fail with timeout errors

book

Article ID: 342546

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • SNMP monitoring of VMware ESXi hosts fail with timeout errors.
  • The ESXi host is running a virtual machine participating in an MSCS using shared RDMs and SCSI Reservations across other hosts.
  • In /var/log/vmkernel.log on the ESXi host contains errors similar to:

    <YYYY-MM-DD>T<time>Z cpu3:11251)ScsiDeviceIO: 2331: Cmd(0x412400f25540) 0x1a, CmdSN 0x42f3b from world 0 to dev "naa.6000d31000780c000000000000000022" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.</time>

  • Running the command esxcli storage core device stats get shows on the ESXi host shows a large number of reservation conflicts associated with LUNs used by the MSCS virtual machines. This example shows the output of a command on an impacted host:

    naa.6000d31000780c000000000000000022:
    Device: naa.6000d31000780c000000000000000022
    Successful Commands: 471
    Blocks Read: 205
    Blocks Written: 0
    Read Operations: 193
    Write Operations: 0
    Reserve Operations: 0
    Reservation Conflicts: 12404126
    Failed Commands: 12404173
    Failed Blocks Read: 0
    Failed Blocks Written: 0
    Failed Read Operations: 0
    Failed Write Operations: 0
    Failed Reserve Operations: 0


Environment

VMware vSphere ESXi 6.0
VMware vSphere ESXi 5.0
VMware vSphere ESXi 6.5
VMware vSphere ESXi 5.1
VMware vSphere ESXi 5.5

Cause

This issue occurs when the LUNs used by the MSCS virtual machines are not perennially reserved causing reservation conflicts. SNMP monitoring software times out with the ESXi SNMP agent while the host is waiting for the MODE_SENSE(0x1a) SCSI commands to timeout due to a reservation conflict.

Resolution

To resolve this issue, perennially reserve the LUNs used by MSCS virtual machines. For more information, see ESXi/ESX hosts with visibility to RDM LUNs being used by MSCS nodes with RDMs may take a long time to start or during LUN rescan (1016106).


Additional Information

ESXi/ESX hosts with visibility to RDM LUNs being used by MSCS nodes with RDMs may take a long time to start or during LUN rescan
MSCS クラスタを構成している仮想マシンが稼動している VMware ESXi ホストの SNMP 監視が、タイムアウトエラーで失敗する