Microsoft Windows Failover Cluster validation reports a warning "Persistent Reservation command took longer than 3 seconds" on virtual machines with shared RDMs
search cancel

Microsoft Windows Failover Cluster validation reports a warning "Persistent Reservation command took longer than 3 seconds" on virtual machines with shared RDMs

book

Article ID: 430353

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Configuring  Microsoft Windows clustering services for the virtual machines with shared RDMs reports a warning during validation process.
  • Performing storage validation check from "Failover Cluster Manager" wizard inside the Guest reports the warning:
    "The test has detected that the Persistent Reservation command took longer than 3 seconds to complete. This may impact cluster stability".

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x
VMware vSphere ESXi 9.x

Cause

The latency observed in the Persistent Reservation command is caused by SCSI frame drops reported at the Adapter firmware level. 

Cause validation:

  • The var/run/log/vmkernel.log file on the affected ESXi hosts confirms a mismatch between the data sent by the storage target and the data received by the adapter firmware.
    YYYY-MM-DDTHH:MM.SSSZ In(182) vmkernel: cpu32:2097266)ScsiDeviceIO: 4633: Cmd(0x45dc4d168d40) 0x5f, CmdSN 0xffff9c822619b410 from world 56712042 to dev "naa.################################" failed H:0x2 D:0x0 P:0x0
    YYYY-MM-DDTHH:MM.SSSZ In(182) vmkernel: cpu27:56629346)qedf:vmhba64:qedfc_scsi_completion:1804:Error: dropped Frame xid[0x530] lba=0x0 lbc=0x0 cmd ##:#:#:#:# data returned 24 required data 0 fw_resid 0
    YYYY-MM-DDTHH:MM.SSSZ In(182) vmkernel: cpu27:56629346)qedf:vmhba64:qedfc_scsi_completion:1804:Error: dropped Frame xid[0x42a] lba=0x0 lbc=0x0 cmd ##:#:#:#:# data returned 24 required data 0 fw_resid 0

    The above output confirms a mismatch - While the storage target returned 24 bytes in response to the SCSI command, the adapter firmware reported receiving 0 bytes.

    SCSI codes:
    0x5f    : Persistent Reserve Out
    H:0x2   : This status is returned when the HBA driver is unable to issue a command to the device. This status can occur due to dropped FCP frames in the environment.

Resolution

  • Ensure the Driver and Firmware are compatible as per Broadcom compatibility Guide.
  • Engage hardware vendor to investigate why the HBA firmware is reporting incorrect frame lengths.