Enabling and disabling vSAN over RDMA with unsupported driver/firmware versions for Intel E810* family NIC may lead to vSAN core dumps or ESXi Host PSOD
search cancel

Enabling and disabling vSAN over RDMA with unsupported driver/firmware versions for Intel E810* family NIC may lead to vSAN core dumps or ESXi Host PSOD

book

Article ID: 313796

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
Enabling and disabling vSAN over RDMA with unsupported driver/firmware for Intel E810* NIC family may lead to vSAN core dump or ESXi Host(s) may encounter PSOD (Purple Screen of Death)

Note:
Unsupported means , it is not listed in vSAN VMware Compatibility Guide (VCG).

The issue occurs when the icen driver gets upgraded to a supported VMware Compatible version but the irdman driver is not updated.  This causes the irdman inbox driver is used.

The below steps are not recommended by VMware and this is one of the ways you might get into the unsupported configuration.

  1. Upgrade only the icen driver version to 1.5.5.0-1OEM for Intel(R) Ethernet Network Adapter E810* family NIC per VCG recommendation.
  2. Do not upgrade irdman inbox driver version 1.3.1.19-1vmw (installed by default with vSphere 7.0U2).
Notes:
-Such configuration is not recommended and not supported by VMware.
-Recommended route is to upgrade inbox irdman driver to async version 1.3.3.7-1OEM  listed on the VCG for vSAN over RDMA.
  1. Enable RDMA from the vSphere UI through the below steps:
  1. Select cluster object: Configure > Services > Network > RDMA support (edit)
  2. Once enabled – no health warnings thrown and vSAN cluster appears to be running as usual, however this is an unsupported and is an untested configuration that could potentially lead to vSAN cluster failures.
VMware recommends disabling such unsupported vSAN over RDMA configuration and installing VCG listed async icen and async irdman drivers before proceeding.

 

  1. Disable RDMA from the vSphere UI through the below steps
  1. Select cluster object: Configure > Services > Network > RDMA support (edit)
  2. Once disable RDMA is invoked from the UI, vSphere UI becomes slow or unresponsive and the following behavior can be seen:
  • The “Reconfigure vSAN cluster” and “Remediate vSAN cluster” tasks get stuck for a long time and eventually time out.
  • Intel E810* NIC running firmware version: 2.4.0 will incur vSAN core dumps, any newer firmware version may incur ESXi host PSOD.
  • In the vSphere UI, vSAN cluster related pages fail to load. For example, cluster object: Configure > vSAN > Services.
 
 
 
 



Resolution

To avoid this issue, make sure you are using VCG supported async icen and async irdman drivers before enabling vSAN over RDMA in vSAN cluster.

For correct vSAN over RDMA async driver configuration, you may need to contact Intel or refer to the Intel vSAN Select Solution Reference Architecture documentation.