Port's are in blocked state after vMotion causing application Downtime .
Pods restart due to node network related timeouts - vmxnet3 reporting tx hangTKGI kubernetes cluster
ESXi Version : 7.0.3.01900
ESXi Version : 8.0.3.24784735
This can happen because of race conditions where between the checkpoint save operation and checkpoint restore, if the vmkernel processed and delivered a packet on the source ESX after its quiesced resulting in change of next2write index of Rx completion ring. But since what got saved was earlier next2write index, it would have found gen bit of rx descriptor to be invalid, resulting in activation failure.
Logs to validate
=============
vmware.log
025-08-19T19:27:13.823Z In(05) vmx - MigrateSetState: Transitioning from state MIGRATE_FROM_VMX_WAITING (9) to MIGRATE_FROM_VMX_PRECOPY (10).
2025-08-19T19:27:28.448Z In(05) vmx - MigrateWaitForData: Waited for 19.18 seconds.
2025-08-19T19:27:28.449Z In(05) vmx - MigrateRPC_DrainPendingWork: Draining pending remote user messages before restore...
2025-08-19T19:27:28.449Z In(05) vmx - MigrateRPC_DrainPendingWork: All pending work completed.
2025-08-19T19:27:28.449Z In(05) vmx - MigrateSetState: Transitioning from state MIGRATE_FROM_VMX_PRECOPY (10) to MIGRATE_FROM_VMX_CHECKPT (11).
2025-08-19T19:27:28.449Z In(05) vmx - SVMotionFixParentPaths: No snapshot paths need to be validated
2025-08-19T19:27:28.449Z In(05) vmx - Migrate_Open: Restoring from <10.196.xx.xx> with migration id 5020850807499805208
2025-08-19T19:27:28.449Z In(05) vmx - DUMPER: Restoring checkpoint version 8.
2025-08-19T19:27:28.449Z In(05) vmx - Checkpointed in VMware ESX, 7.0.3, build-24585291, Linux Host
2025-08-19T19:27:28.449Z No(00) vmx - ConfigDB: Setting sched.swap.derivedName = "/vmfs/volumes/vsan:<UUID>/lt-cmg12u-MG-VM-1-PlgI-3ee5axxxx.vswp"
2025-08-19T19:27:28.449Z In(05) vmx - ConfigDB: Ignoring request to write config file
2025-08-19T19:27:28.449Z No(00) vmx - PowerOnTiming: Module Migrate took 19181614 us
2025-08-19T19:27:28.413Z In(05) vcpu-0 - Migrate: VM successfully stunned.
2025-08-19T19:27:28.449Z In(05) worker-6953703 - Migrate: Remote Log: Destination waited for 19.18 seconds.
2025-08-19T19:27:28.449Z In(05) worker-6953703 - Migrate: Remote Log: Beginning checkpoint restore.
2025-08-19T19:27:28.449Z In(05) worker-6953703 - Migrate: Remote Log: Switching to checkpoint state.
2025-08-19T19:27:28.538Z In(05) vcpu-0 - VMXNET3 user: failed to activate 'Ethernetx', status: 0xbad0001
vmkernel.log
2025-08-19T19:27:28.504Z In(182) vmkernel: cpu18:2401914)Net: 2238: connected lt-cmg12u-MG-VM-1-xxxx.ethx ethx to vDS, portID 0x400001b
2025-08-19T19:27:28.538Z In(182) vmkernel: cpu44:2401914)Vmxnet3: 12036: Invalid gen bit for rq: 0, World_Handle: 0x45390489f000
There is no workaround .
Fixed in below versions