vmware.log of the target VM reveals messages similar to the following during the "Stop and Copy" phase of the migration task:Er(02) vcpu-x - vmiop_log: (0x0): Copy sysmem tracking failed, 0x7Er(02) vcpu-x - vmiop_log: (0x0): CPU RPC async recv response failed: 0x7Er(02) vcpu-x - vmiop_log: (0x0): Recv MIGRATION Stop and Copy RPC response failed, 0x7Er(02) vcpu-x - vmiop_log: (0x0): stop and copy failed
Note: In addition to the above messages, the following may also be observed:Er(02) vthread-xxxxxxx - vmiop_log: (0x0): GSP plugin task crashed. VM shutdown is required.
VMware vSphere ESXi
Because the NVIDIA GSP plugin is not functioning normally during the migration phase of vMotion or Storage vMotion, continuous synchronization of the frame buffer and memory state via the vmiop module cannot be performed, a timeout occurs in the Stop and Copy phase, and the vMotion or Storage vMotion task fails.
Note: Because memory tracking by vmiop is not required in cold migration (migration in a powered-off state), this issue does not occur.
Because this issue is caused by the behavior of the vGPU module provided by NVIDIA, there is no permanent solution through configuration changes on the vSphere side.
If this issue continues to occur, please contact NVIDIA support.