Symptoms:
YYYY-MM-DDTHH:MM:SS cpu80:xxxxxx)WARNING: rdmaDriver: RDMAIsTeamUplinkChanged:3505: oldUplink = vmnicX newUplink = vmnicYYYYY-MM-DDTHH:MM:SS cpu65:xxxxxx)RDT: RDTRDMAServerCMEventCB:2558: VMK_RDMA_CM_EVENT_ADDR_CHANGE event occured, cmID xxxxxx, eventType 14 cluster Protocol 2YYYY-MM-DDTHH:MM:SS cpu65:xxxxxx)RDT: RDTRDMAServerCMEventCB:2561: RDMA properties changedYYYY-MM-DDTHH:MM:SS cpu80:xxxxxx)WARNING: rdmaDriver: RDMAIsTeamUplinkChanged:3505: oldUplink = vmnicX newUplink = vmnicYYYYY-MM-DDTHH:MM:SS cpu80:xxxxxx)WARNING: rdmaDriver: RDMAIsTeamUplinkChanged:3505: oldUplink = vmnicX newUplink = vmnicYYYYY-MM-DDTHH:MM:SS cpu80:xxxxxx)WARNING: rdmaDriver: RDMAIsTeamUplinkChanged:3505: oldUplink = vmnicX newUplink = vmnicYYYYY-MM-DDTHH:MM:SS cpu80:xxxxxx)WARNING: rdmaDriver: RDMAIsTeamUplinkChanged:3505: oldUplink = vmnicX newUplink = vmnicYYYYY-MM-DDTHH:MM:SS cpu80:xxxxxx)WARNING: rdmaDriver: RDMAIsTeamUplinkChanged:3505: oldUplink = vmnicX newUplink = vmnicYYYYY-MM-DDTHH:MM:SS cpu65:xxxxxx)RDT: RDTRdmaClientCMEventCB:3380: Dropped client connect event 14, new event 14 rdmaConn xxxxxxYYYY-MM-DDTHH:MM:SS cpu65:xxxxxx)RDT: RDTRdmaClientCMEventCB:3380: Dropped client connect event 14, new event 14 rdmaConn xxxxxxYYYY-MM-DDTHH:MM:SS cpu65:xxxxxx)RDT: RDTRdmaClientCMEventCB:3380: Dropped client connect event 14, new event 14 rdmaConn xxxxxx
YYYY-MM-DDTHH:MM:SS cpu80:xxxxxx) opID=xxxxxx)RDT:RDTRDMAStopConnectionsForServer:995: waiting for 63 active connections to endYYYY-MM-DDTHH:MM:SS cpu80:xxxxxx) opID=xxxxxx)RDT:RDTRDMAStopConnectionsForServer:998: Waiting for the connections to get terminated 63YYYY-MM-DDTHH:MM:SS cpu80:xxxxxx) opID=xxxxxx)RDT: RDTRDMAStopConnectionsForServer:995: waiting for 7 active connections to endYYYY-MM-DDTHH:MM:SS cpu80:xxxxxx) opID=xxxxxx)RDT: RDTRDMAStopConnectionsForServer:998: Waiting for the connections to get terminated 7YYYY-MM-DDTHH:MM:SS cpu65:xxxxxx) opID=xxxxxx)RDT: RDTRDMAStopConnectionsForServer:995: waiting for 0 active connections to endYYYY-MM-DDTHH:MM:SS cpu65:xxxxxx) opID=xxxxxx)RDT: RDTDestroyRDMAServer:2892: Calling server cmid destroyYYYY-MM-DDTHH:MM:SS cpu65:xxxxxx) opID=xxxxxx)RDT: RDTCreateRDMAServer:2779: RDTCreateRDMAServer() exiting
YYYY-MM-DDTHH:MM:SS cpu80:xxxxxx cpu89:xxxxxx)Backtrace for current CPU #89, worldID=xxxxxx, fp=xxxxxxYYYY-MM-DDTHH:MM:SS cpu80:xxxxxx cpu89:xxxxxx)0x453af0e1bea0:[0x42002bce596e][email protected]#0.0.0.1+0x142 stack: 0x100000000000750, 0x0, 0x0, 0x420054000000, 0x430384801630YYYY-MM-DDTHH:MM:SS cpu80:xxxxxx cpu89:xxxxxx)0x453af0e1bf70:[0x42002bcd422f][email protected]#0.0.0.1+0x58 stack: 0x72, 0x4336bee11ba0, 0x0, 0x420029b9f7f9, 0x72YYYY-MM-DDTHH:MM:SS cpu80:xxxxxx cpu89:xxxxxx)0x453af0e1bfa0:[0x420029b9f7f8]vmkWorldFunc@vmkernel#nover+0x31 stack: 0x420029b9f7f4, 0x0, 0x453af0e1f000, 0x453aeea9f100, 0x453af0e1f100YYYY-MM-DDTHH:MM:SS cpu65:xxxxxx cpu89:xxxxxx)0x453af0e1bfe0:[0x42002a0d67b2]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0, 0x420029b44cf0, 0x0, 0x0, 0x0YYYY-MM-DDTHH:MM:SS cpu65:xxxxxx cpu89:xxxxxx)0x453af0e1c000:[0x420029b44cef]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0YYYY-MM-DDTHH:MM:SS cpu65:xxxxxx cpu89:xxxxxx)ESC[45mESC[33;1mVMware ESXi 8.0.3 [Releasebuild-24280767 x86_64]ESC[0m#PF Exception 14 in world xxxxxx:rdtNetworkWo IP xxxxxx addr 0x8
VMware vSphere ESXi 8
Due to a rare race condition, an ESXi host might fail with a purple diagnostic screen after a failover of the vmnic when vSAN is used over RDMA.
This issue is resolved in ESXi 8.0u3i release.