vMotion fails with Status - Failed to wait for data. Error 195887124. Out of memory
Error: stack:
vmware.log
2020-09-28T07:12:54.669Z| vmx| I125: MigrateSetInfo: state=8 srcIp=<x.x.x.x> dstIp=<x.x.x.x> mid=8242692619412767915 uuid=########-####-####-####-########5632 priority=high
...
2020-09-28T07:13:47.682Z| vmx| I125: [msg.migrate.waitdata.platform] Failed waiting for data. Error bad0014. Out of memory.
2020-09-28T07:13:47.682Z| vmx| I125: [vob.vmotion.send.async.restore.failed] vMotion migration [a70fa09:8242692619412767915] failed to asynchronously receive and apply state from the remote host: Out of memory.
2020-09-28T07:13:47.682Z| vmx| I125: [vob.vmotion.send.get.dvfilterstate.failed] vMotion migration [a70fa09:8242692619412767915] failed to get DVFilter state from the source host <x.x.x.x>
2020-09-28T07:13:47.682Z| vmx| I125: [vob.heap.grow.size.not.allowed] Heap dvfilter may only grow by 49881088 bytes (117896024/167777112), which is not enough for allocation of 117587968 bytes
At destination host: vmkernel.log
2020-09-28T07:13:47.675Z cpu13:2793737)WARNING: VMotionSend: 5923: 8242692619412767915 D: Failed handling message reply GET_DVFILTER_STATE: Out of memory
2020-09-28T07:13:47.678Z cpu13:2793737)WARNING: VMotionSend: 4913: 8242692619412767915 D: failed to asynchronously receive and apply state from the remote host: Out of memory.
2020-09-28T07:13:47.678Z cpu13:2793737)WARNING: Migrate: 282: 8242692619412767915 D: Failed: Out of memory (0xbad0014) @0x41800c779bb2
2020-09-28T07:13:47.678Z cpu13:2793737)WARNING: VMotionUtil: 7659: 8242692619412767915 D: timed out waiting 0 ms to transmit data.
2020-09-28T07:13:47.682Z cpu8:2793643)WARNING: Migrate: 6145: 8242692619412767915 D: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.
DVfilter might use a large heap to allocate space for both, for the temporary allocations used for moving the state o the machine. In some cases the filter states can be very large leading this default heap to be exhausted during vMotion
This issue is fixed in vSphere 7.0.3, 6.5 P06 and 6.7 P05.
As a workaround, increase the dvfilter size by running the following command in the source and destination host:
esxcfg-module -s DVFILTER_HEAP_MAX_SIZE=276834000 dvfilter
Note: This requires reboot the ESXi host to take effect.
To check the current size of the DVFilter heap:
vsish
cd /system/heaps/dvfilterVMotion-XXXXXXXX
get stats
Example output:
/system/heaps/dvfilterVMotion-0x43144da00000/> get stats
Heap stats {
Name:dvfilterVMotion
owning module id:0
...
current heap size:111016
...
maximum heap size:536875432
vMotion fails when DRS decides to place VMs on a host.