Time based policy in failure state due to spacing in time attribute. Port's are in blocked state after vMotion
search cancel

Time based policy in failure state due to spacing in time attribute. Port's are in blocked state after vMotion

book

Article ID: 441835

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Following vMotion activity, network ports become blocked, disrupting traffic flow and potentially causing widespread network connectivity failures


vmware.log:
025-08-19T19:27:13.823Z In(05) vmx - MigrateSetState: Transitioning from state MIGRATE_FROM_VMX_WAITING (9) to MIGRATE_FROM_VMX_PRECOPY (10).
2025-08-19T19:27:28.448Z In(05) vmx - MigrateWaitForData: Waited for 19.18 seconds.
2025-08-19T19:27:28.449Z In(05) vmx - MigrateRPC_DrainPendingWork: Draining pending remote user messages before restore...
2025-08-19T19:27:28.449Z In(05) vmx - MigrateRPC_DrainPendingWork: All pending work completed.
2025-08-19T19:27:28.449Z In(05) vmx - MigrateSetState: Transitioning from state MIGRATE_FROM_VMX_PRECOPY (10) to MIGRATE_FROM_VMX_CHECKPT (11).
2025-08-19T19:27:28.449Z In(05) vmx - SVMotionFixParentPaths: No snapshot paths need to be validated
2025-08-19T19:27:28.449Z In(05) vmx - Migrate_Open: Restoring from <##.##.##.##> with migration id 5020850807499805208
2025-08-19T19:27:28.449Z In(05) vmx - DUMPER: Restoring checkpoint version 8.
2025-08-19T19:27:28.449Z In(05) vmx - Checkpointed in VMware ESX, 7.0.3, build-24585291, Linux Host
2025-08-19T19:27:28.449Z No(00) vmx - ConfigDB: Setting sched.swap.derivedName = "/vmfs/volumes/vsan:521####7f2e58a07-62e675####329712/10####68-####-####-####-a088c21f05aa/lt-####-MG-VM-1-PlgI-3ee5a316.vswp"
2025-08-19T19:27:28.449Z In(05) vmx - ConfigDB: Ignoring request to write config file
2025-08-19T19:27:28.449Z No(00) vmx - PowerOnTiming: Module Migrate took 19181614 us


2025-08-19T19:27:28.413Z In(05) vcpu-0 - Migrate: VM successfully stunned.
2025-08-19T19:27:28.449Z In(05) worker-6953703 - Migrate: Remote Log: Destination waited for 19.18 seconds.
2025-08-19T19:27:28.449Z In(05) worker-6953703 - Migrate: Remote Log: Beginning checkpoint restore.
2025-08-19T19:27:28.449Z In(05) worker-6953703 - Migrate: Remote Log: Switching to checkpoint state.


2025-08-19T19:27:28.538Z In(05) vcpu-0 - VMXNET3 user: failed to activate 'Ethernet2', status: 0xbad0001
vmkernel.log:
 
2025-08-19T19:27:28.504Z In(182) vmkernel: cpu18:2401914)Net: 2238: connected lt-####-MG-VM-1-PlgI.eth2 eth2 to vDS, portID 0x400001b
2025-08-19T19:27:28.538Z In(182) vmkernel: cpu44:2401914)Vmxnet3: 12036: Invalid gen bit for rq: 0, World_Handle: 0x45390489f000


Impact:
Post vMotion from vCenter it shows that all the NICs for this VM are in CONNECTED state; however, some of them are not able to cater to any traffic. Also DVS shows that the VM port link status is down which had an traffic disruption from application VM's

Environment

VMware vSphere ESXi

Cause

This can happen because of race conditions  where between the checkpoint save operation and checkpoint restore, if the vmkernel processed and delivered a packet on the source ESX after its quiesced resulting in change of next2write index of Rx completion ring. But since what got saved was earlier next2write index, it would have found gen bit of rx descriptor to be invalid, resulting in activation failure.

Resolution

There is no workaround .

Fixed in below versions
7.0.3 P10
8.0.3.0 P06
9.0.0.0