Symptoms:
- vMotion of certain VMs fail with an "Out of Memory" Error in NSX environment.
- In the
/var/log/vmkernel.log
on the source host; below logs can be seen:
2017-06-14T10:17:09.897Z cpu0:59176047)vsip VSIPDVFGetSavedStateLen:1772: Sending length of 38579316 2017-06-14T10:17:09.897Z cpu0:59176047)vsip VSIPDVFGetSavedStateLen:1772: Sending length of 38832436 2017-06-14T10:17:09.897Z cpu0:59176047)vsip VSIPDVFGetSavedStateLen:1772: Sending length of 12788 2017-06-14T10:17:10.183Z cpu0:59176047)WARNING: Migrate: 270: 1497435428078442 S: Failed: Failed to resume virtual machine (0xbad0044) @0x4180116e69efNote: The relevant part here are the rather high values for sending length.
- In the
/var/log/vmkernel.log
on the destination host; below logs can be seen:
2017-06-14T10:17:10.182Z cpu18:679220)WARNING: Heap: 3728: Heap dvfilter (77538136/138416984): Maximum allowed growth (60878848) too small for size (77426688) 2017-06-14T10:17:10.182Z cpu18:679220)WARNING: Heap: 4225: Heap_Align(dvfilter, 77424732/77424732 bytes, 8 align) failed. caller: 0x41801ee8cdc5 2017-06-14T10:17:10.182Z cpu18:679220)WARNING: VMotionSend: 4978: 1497435428078442 D: Failed handling message reply 1: Out of memory 2017-06-14T10:17:10.183Z cpu18:679220)WARNING: VMotionSend: 3979: 1497435428078442 D: failed to asynchronously receive and apply state from the remote host: Out of memory. 2017-06-14T10:17:10.183Z cpu18:679220)WARNING: Migrate: 270: 1497435428078442 D: Failed: Out of memory (0xbad0014) @0x41801f945786
The structure for receiving dv-filter state tables has been optimized in ESXi 6.5 p06 (and onwards) as well as ESXi 6.7 p05 (and onwards), making recurrence of this issue very unlikely.
Workaround:
To workaround this issue, increase the heap size allocated to receiving the firewall state table.
The command to increase the memory allocated to a greater size is below (to be run on the ESXi host):
# esxcfg-module -s DVFILTER_HEAP_MAX_SIZE=276834000 dvfilter
Alternatively, to clear the state information for a VM entirely, freeing the VM from this issue, one may add the VM to the DFW exclusion list.