ESXi experiences PSOD when FT secondary fails to initialize
search cancel

ESXi experiences PSOD when FT secondary fails to initialize

book

Article ID: 317931

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

The article provides resolution information when ESXi experiences PSOD while enabling Fault Tolerance for a virtual machine.

Symptoms:
  • ESXi experiences PSOD when enabling Fault Tolerance.
  • Secondary host experiences PSOD.
  • In the /var/run/log/vmkernel.* file, you see the entries similar to:

    2016-10-22T17:30:36.145Z cpu11:2829505)FTCpt: 4458: (884930615770 snd) Secondary init: nonce 76231358
    2016-10-22T17:30:36.145Z cpu11:2829505)FTCpt: 4541: (884930615770 snd) vmx 2829505 vmm 2829545
    _[31;1m2016-10-22T17:30:36.228Z cpu11:2829505)ALERT: MemSched: 12594: memory low: 102988K free_[0m
    2016-10-22T17:30:36.228Z cpu11:2829505)Backtrace for current CPU #11, worldID=2829505, rbp=0x4
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609b470:[0x418010e17726]MemSchedUpdateFreeStateInt@vmkernel#nover+0x126 stack: 0x1, 0x6493,
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609b4a0:[0x418010e1d05b]MemSched_UpdateFreeState@vmkernel#nover+0x4b stack: 0x0, 0x418010c68
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609b4c0:[0x418010c68cdb]MemMapDecFreePages@vmkernel#nover+0xcf stack: 0x1, 0x1, 0x2, 0x41801
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609b4f0:[0x418010c68dd4]MemMapAllocateAndAccount@vmkernel#nover+0xa4 stack: 0x2, 0x2, 0x4303
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609b520:[0x418010c68e87]MemMapPageCacheAlloc@vmkernel#nover+0x7b stack: 0x43043ca17b01, 0x14
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609b960:[0x418010c195b8]PageCache_Alloc@vmkernel#nover+0xc4 stack: 0x1, 0x43922609baf0, 0x0,
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609b9a0:[0x418010c6c5ac]MemMap_AllocateFromNode@vmkernel#nover+0x154 stack: 0x0, 0x418011f07
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609ba00:[0x418010f4721a]MemDistributeNUMAPolicy@vmkernel#nover+0x582 stack: 0x439dca9c1600,
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609bb40:[0x418010f47a5d]MemDistribute_Alloc@vmkernel#nover+0x299 stack: 0x779982b1, 0x7f536f
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609bca0:[0x418010cf65d8]VmMem_AllocNonGuestPage@vmkernel#nover+0xec stack: 0x4392274a7000, 0
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609bd20:[0x418010cf46e3]VmAnon_AllocKernDataPage@vmkernel#nover+0x177 stack: 0x4305d13d26f0,
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609bd60:[0x418011f07bf3]FTCptAllocPage@<None>#<None>+0x1b stack: 0x770b8, 0x418011f082da, 0x
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609bd80:[0x418011f082da]FTCptAllocPageMap@<None>#<None>+0xee stack: 0x2000011f2cf5a, 0x4305d
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609be00:[0x418011f17bc5]FTCpt_SecondaryInit@<None>#<None>+0x481 stack: 0x4392260a7000, 0x0,
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609beb0:[0x41801123cee7]UW64VMKPrivateSyscallUnpackFTCptSecondaryInit@<None>#<None>+0x17 sta
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609bec0:[0x4180111c1bf7]User_UWVMK64SyscallHandler@<None>#<None>+0x26b stack: 0x3ffec0449b0,
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609bf30:[0x418010cc4bf9]SyscallUWVMK64@vmkernel#nover+0x90 stack: 0x0, 0x0, 0x5e, 0x0, 0x22b
    2016-10-22T17:30:36.228Z cpu11:2829505)0x43922609bf38:[0x418010cc7044]gate_entry_@vmkernel#nover+0x0 stack: 0x0, 0x5e, 0x0, 0x22b4eb48, 0x
    _[31;1m2016-10-22T17:30:36.245Z cpu11:2829505)ALERT: MemSched: 12594: memory low: 50612K free_[0m
    ---------------------------
    _[7m2016-10-22T17:30:36.284Z cpu11:2829505)WARNING: VmAnon: vm 2829545: 1480: kern anon mpn allocation failed: usePreallocPool = 0_[0m
    _[7m2016-10-22T17:30:36.284Z cpu11:2829505)WARNING: FTCpt: 9792: (884930615770 snd) Error allocating a page (currently 103714 pages allocated, 405 MB)_[0m
    _[7m2016-10-22T17:30:36.284Z cpu11:2829505)WARNING: FTCpt: 2885: (884930615770 snd) Error allocating page 95394/131072: Out of memory_[0m
    _[7m2016-10-22T17:30:36.322Z cpu11:2829505)WARNING: FTCpt: 4660: (884930615770 snd) Error starting FT secondary: Out of memory_[0m
    2016-10-22T17:32:18.143Z cpu1:2829505)World: 9742: TR 0x4020 GDT 0x4392260a1000 (0x402f) IDT 0x418010cc9000 (0xfff)
    2016-10-22T17:32:18.143Z cpu1:2829505)World: 9743: CR0 0x80010031 CR3 0x87003b000 CR4 0x42768
    2016-10-22T17:32:18.181Z cpu1:2829505)Backtrace for current CPU #1, worldID=2829505, rbp=0x4305d13d26f0
    2016-10-22T17:32:18.181Z cpu1:2829505)0x43922609bd80:[0x418011f28f96]FTCptRLECleanupDestInfo@<None>#<None>+0x2a stack: 0x4305d13d2778, 0x
    2016-10-22T17:32:18.181Z cpu1:2829505)0x43922609bda0:[0x418011f11d4a]FTCptDestInfoCleanup@<None>#<None>+0x162 stack: 0x4305d13d2778, 0x41
    2016-10-22T17:32:18.181Z cpu1:2829505)0x43922609bdc0:[0x418011f0a2ba]FTCptSessionInfoCleanup@<None>#<None>+0x7b6 stack: 0x0, 0x4305d13d26
    2016-10-22T17:32:18.181Z cpu1:2829505)0x43922609be10:[0x418011f0a3e7]FTCpt_Stop@<None>#<None>+0x87 stack: 0x4392260a8814, 0x43922609be9c,
 
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware vSphere ESXi 6.0

Cause

The PSOD is caused by a null-pointer dereference in the Fault Tolerance cleanup code invoked after a failed Fault Tolerance secondary virtual machine power on.

Resolution

 
 
This issue is resolved in VMware ESXi 6.0 Update 3, available at VMware Downloads