Provide a workaround and cause of some occurrence of an unprotected VM HA error when it is actually protected
Symptoms:
# grep -i “actual protection state for VM” vpxd.log
2018-12-14T10:06:36.288-08:00 verbose vpxd[29965] [Originator@6876 sub=dasVm opID=jnm1ctvk-871453-auto-iof2-h5:700912##-##-##-##-##-02-7edf64bd] actual protection state for VM vm-4357 ‘n/a’ -> ‘unprotected’
2018-12-14T10:06:46.790-08:00 verbose vpxd[04231] [Originator@6876 sub=dasVm opID=jnm1ctvk-871526-auto-ioh4-h5:70091308-89-50629fcf] actual protection state for VM vm-4357 ‘unprotected’ -> ‘n/a’
2018-12-14T10:07:01.141-08:00 verbose vpxd[04245] [Originator@6876 sub=dasVm opID=jnm1ctvk-871607-auto-iojc-h5:70091358-a-##-##-##-##-##ba2558] actual protection state for VM vm-4524 ‘n/a’ -> ‘unprotected’
2018-12-14T10:09:00.234-08:00 verbose vpxd[04251] [Originator@6876 sub=dasVm opID=jnm1ctvk-872323-auto-ip39-h5:70091846-9-2c575892] actual protection state for VM vm-4524 ‘unprotected’ -> ‘n/a’
2018-12-14T10:09:48.567-08:00 verbose vpxd[29954] [Originator@6876 sub=dasVm opID=jnm1ctvk-872563-auto-ip9w-h5:700920##-##-##-##-##-01-7fe98fc7] actual protection state for VM vm-4524 ‘n/a’ -> ‘unprotected’
2018-12-17T08:34:57.517-08:00 verbose vpxd[05830] [Originator@6876 sub=dasVm opID=jnm1ctvk-886926-auto-j0cv-h5:700930##-##-##-##-##-02-86210fc] actual protection state for VM vm-4790 ‘n/a’ -> ‘unprotected’.
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
Once a new VM is created and powered on, the protection update is not being propagated to vCenter from FDM Primary Node. This leads to the unprotecting alarm against the VM in vCenter.
The HA error will still be visible in the UI even after VM restarts. However, VMware verified that the underlying HA functionality remained unchanged. That is, the VM was restarted appropriately on another host by HA when the host it was residing on failed.
The issue is resolved in the ESXi version 6.5 u3 and 6.7 u3 available for download at the Broadcom Downloads page.
Workaround:
As a workaround,
Disable and (Re)enable HA
Disabling HA does not affect running virtual machines at all. It does leave them unprotected by HA for the short time until HA is re-enabled.
Note: VMware HA can be disabled only if there are no virtual machines with VMware Fault Tolerance (FT) enabled. If there are virtual machines with VMware FT enabled in the cluster you are disabling, turn off VMware FT before disabling VMware HA.
The risk is low as the actual protection state of the VM is correct. The state is not relayed correctly to the VC UI. Also, we have seen this problem only with newly added VMs and not with already existing/protected VMs.