ESXi host can PSOD if memory reservation for a VM is changed from 100% to less
book
Article ID: 317851
calendar_today
Updated On:
Products
VMware vSAN
Issue/Introduction
To provide information about how to fix the objects identified by vSAN Health Alarm to avoid potentially PSODing host(s). The alarm displayed on the vSAN cluster is called "Potential PSOD issue is detected due to improper object flag leak for some of vSAN objects".
Symptoms:
Hosts running 7.0.3 U3f or earlier and tries to resync a component that has (1<<27) flag, it will PSOD. An alarm on the vSAN cluster with message "Potential PSOD issue is detected due to improper object flag leak for some of vSAN objects" will be displayed if there is potential this issue can occur.
Conditions For 0 Byte Object with (1<<27) Flag
VM must be provisioned with a full memory reservation
When the VM is powered on, the swap object will be created with 0 byte address space, but the components will not have (1<<27) flag
The swap object goes through a configuration change due to one of the following conditions:
During VM power on, enough hosts are unavailable such that the swap object gets force provisioned, and later enough hosts are added so that it is reconfigured to the target policy. This may be common in a stretch cluster if there is a loss of connectivity to witness or one of the sites is down, and later the connectivity is restored.
The swap object goes through a policy change due to scale-up or scale-down
A new policy is applied to the virtual machine
After one of the reconfigurations above, the 0 byte swap object will now have (1<<27), flag
The memory reservation is removed or reduced for the virtual machine while it is powered on, and the virtual machine is vMotion or other such events on it to cause the swap object to be resized from 0 bytes to non-zero bytes
If the host running the virtual machine is overcommitted from the memory perspective, this virtual machine may swap out some of its memory to the swap object
The component needs to be resynchronized due to state resync or another policy change or such operation
Environment
VMware vSAN 7.0.x
Resolution
This has been addressed in ESXi 7.0 U3g. However if the error message is present on the environment, the workaround should be applied before updating ESXi hosts or performing any maintenance which could start a resync and cause hosts to fail.
Workaround: 1. Download the attached script fixESACompFlag.py 2. Upload the script to /tmp on one of the hosts in the cluster 3. Run the script on one of the hosts with the following command fixESACompFlag.py
Sample output:
[root@hostname:~] ./fixESACompFlag.py Fixed object 32e12c63-f302-07c7-3863-0200cc722300 All objects fixed
[root@hostname:~] ./fixESACompFlag.py No object fixes necessary.
Note: A health alarm has been developed to warn users whenever detecting there is a problematic object