ESXI PSOD with SpinCount locks

search cancel

ESXI PSOD with SpinCount locks

book

Article ID: 392817

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXI host will have PSODs due to "LockCheckSpinCountInt" events when using a default RCP/IP stack instead of a dedicated, separate vmotion stack

You will get a PSDO with backtrace that has the events bellow indicating the Events bellow:

#0 LockCheckSpinCountInt
#1 Lock_CheckSpinCount
#2 SP_WaitReadLock
#3 SPAcqWriteLockWork
#4 SP_AcqWriteLockWithRA
#5 tcp_usr_shutdown
#6 soshutdown
#7 vmk_shutdown
#8 Net_ShutdownSocket
#90 UserSocketInetShutdown
#10 UserSocket_Shutdown
#11 LinuxSocket_Shutdown64
#12 User_LinuxSyscallHandler

Environment

ESXi 7.0

Cause

This is caused the lack of a dedicated TCP/IP stack for vmotions

When using "default" TCP/IP stacks , large amount of migrations can cause tcbinfo lock contention

For Example:

4 vmks with a default Tcp/IP Stack would result in the following:

8 parallel vMotion's sessions for each network stream helper
This totals 64 vMotion processes
For TCP/IP this results in 16 network receive processes ( 4 nics * 4 processes per NIC )

This volume of concurrent processes with the large amount of data in transit causes the lock contention with tcbinfo as too many vMotion threads reading and writing to the network socket buffer

This results in the host having a PSOD

Resolution

Reduce the volume of concurrent migrations using the settings and steps in the kb bellow

Resource Manager Settings for Managing Migrations

This should result in a total reduction in vMotion threads hitting the network socket buffer at the same time and avoid the tcbinfo lock issue

You can also reconfigure your TCP/IP stack and set a dedicated stack for vMotion

Dedicated vMotion Stack

Networking Best Practices

Feedback

thumb_up Yes

thumb_down No