A long snapshot stun time may occur when virtual machine runs on a datastore backed by NVMe-TCP
search cancel

A long snapshot stun time may occur when virtual machine runs on a datastore backed by NVMe-TCP

book

Article ID: 374503

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VM experiences performance issue while taking snapshot since a long snapshot stun time may occur when virtual machine runs on a datastore backed by NVMe-TCP.


Here is the sample vmware.log for a 2 vmdk VM:

XXX In(05) vcpu-0 - Checkpoint_Unstun: vm stopped for 2687873 us
XXX In(05) vcpu-0 - CPT: vm was stunned for 2806449 us

 

Environment

Storage is connected via NVMe-TCP protocal.

Cause

The NVMe-TCP driver sends 2 PUDs of the FUSED compare command and FUSED write command (as 2 packets) to the TCP layer without gap. Due to the way NVMe-oF targets process the FUSED commands, the TCP layer does not send the PDU of the FUSED write command until it receives the ACK signal of the FUSED compare command which can be up to 40 milliseconds. This 40 millisecond delay caused performance issue when taking snapshots.

Resolution

  • For version 7.x, The issue will be fixed in 7.0.3 P10.
  • For version 8.x, The issue has been fixed in ESXi 8.0 Update 3b.

Note: Latency issue is fixed in the following version by always sending 2 PDUs of FUSED compare command and write command in one TCP packet.

Additional Information

ESXi 9.0 is not affected.