With ESX 9.0 release, the operating system supports updating certain user world entities without rebooting the host. This update is termed as Live Update which can be applied to a live system without rebooting it thus making the update process efficient and faster. The user world entities for which, Live patching is currently supported for user world daemons, application binaries and security policy files.
Remediating process has two stages:
Remediation of a live patch which updates the above user world entities on a running ESX could fail for multiple reasons. This article outlines failure scenarios and corresponding mitigations for restoring ESX's stability.
Patching User World Daemons:
These are the applications that run in the background and provide some services in the ESX operation system. The patches to these daemons are applied by overlaying the existing executable file, with a newer one. and when such daemon binaries are patched it requires a daemon/service restart. The remediation process updates the daemon application binary and the running daemon is patched with the following steps:
The following log message appears in var/run/log/syslog.log when this error occurs:
syslog.log error message
YYYY-MM-DDTHH:MM:SS Db(15) daemon_apply_published.py[1000081850]: restartDaemon: <DAEMON_NAME> Stop failed |
Following messages appear in the vCenter UI when the remediation process encounters this error:
Stop failure
Live Patch - Daemon patching failed. Failure occurred during the stop call for <DAEMON_NAME>. Manual remediation recommended. Please refer to KB article: KB 375947. |
The following message appears in var/run/log/syslog.log when this error occurs:
Failure to start the patched daemon
YYYY-MM-DDTHH:MM:SS Db(15) daemon_apply_published.py[1000081775]: restartDaemon: <DAEMON_NAME> Start failed. YYYY-MM-DDTHH:MM:SS Db(15) daemon_apply_published.py[1000081775]: rollbackDaemon: Previous version of <DAEMON_NAME> started. |
The following message will appear on the vCenter UI when this error is encountered:
Patched daemon start failure
Live Patch - Daemon patching failed. Failure occurred during the start of patched version of <DAEMON_NAME>. Restarted unpatched version. Please refer to KB article: KB 375947." |
Failure to launch the unpatched daemon binary:
The following message appears in var/run/log/syslog.log when this error occurs:
Rollback failure
YYYY-MM-DDTHH:MM:SS Db(15) daemon_apply_published.py[1000081356]: rollbackDaemon: <DAEMON_NAME> rollback failed.
|
The following message will appear on the vCentre UI when this error is encountered:
Failure to start unpatched daemon
Live Patch - Daemon patching failed. Failure occurred during restarting unpatched version of <DAEMON_NAME>. Manual remediation recommended. Please refer to KB article: KB 375947. |
Recommended action: Please refer to Failure to start the unpatched daemon binary section in the resolution below.
Patching Applications:
Besides daemons, any application can be patched. In contrast to daemons, applications are usually short running and are lunched by daemon or from a user. While applying the patch, the functionality of the patched application is verified and a failure is raised if the new version cannot be launched.
If there is a running instance of the patched binary, this instance is not re-started! For instance, if there is an open ssh session, this session must be closed to get the host into a compliant state. Following errors could be encountered during the patching of an application:
The following message appears in var/run/log/syslog.log when this error occurs:
Application binary patch failure - syslog
YYYY-MM-DDTHH:MM:SS Db(15) daemon_helper_apply_published.py[1000343537]: verifyDaemonHelper: <DAEMON_HELPER_APPLICATION_NAME> verification failed
|
The following message will appear on the vCenter UI when this error is encountered:
Helper Daemon verify failure
Live Patch - Daemon helper patching failed. Verification of the helper binary <HELPER_DAEMON_NAME> failed (command '<VERIFY_CMD>'). Manual remediation recommended. Please refer to KB article: KB 375947. |
The following message appears in var/run/log/syslog.log when this error occurs:
Helper application related Daemon restart failure
YYYY-MM-DDTHH:MM:SS Db(15) daemon_helper_apply_published.py[1000343586]: rollbackDaemon: <DAEMON_NAME> rollback failed.
|
The following message will be shown on vCenter UI:
Dependent daemon - unpatched version launch failure
Live Patch - Daemon patching failed. Failure occurred during restarting unpatched version of <DAEMON_NAME>. Manual remediation recommended. Please refer to KB article: KB 375947. |
Host being reported as "Non-compliant" after the patch was successfully applied.
Host reporting Non-Compliant
Following daemon helpers are not compliant, as unpatched instance(s) are still running: <APPLICATION_LIST>. Please refer to KB article: KB 375947. |
Patching Security Policy files:
Access domains specifies the rules to extend/restrict the access permissions to certain system components/services of the user world. The specifics of these access/restriction is provided as a access domain file which can be live patched. When the access domain files are live patched, it triggers reloading all the system wide access permissions thus apply the newly patched security policies.
When livepatching the VMK access domain fails, the following message will appear on th vCenter UI:
VMK access Live patch failure message
Loading default security policies has failed. |
This failure can lead to a deprecated state and impact certain operations (incl. VM stop and migration). The resolution is to restart the host, since the earlier security policies cannot be reloaded(from the unpatched system).
Recommended action: Please refer to Failure to load patched security policies section of the resolution section below.
Other general errors/exceptions encountered during remediation process:
If there are any exceptions that are encountered during scan/apply stage of remediation process, the following message will appear on the vCenter UI:
Encountering exception during patching process
Live Patch - Daemon patching failed. Failure occurred due to an unexpected exception:<EXCEPTION_TYPE>. Manual remediation recommended. Please refer to KB article: KB 375947. |
Recommended action:
If the exception was encountered during scan stage, please refer to Failure due to exception during scan section below.
Cluster upgrade stops.
Compliance check will report the host as Non-compliant.
VMware vSphere ESX 9.x
This section provides details on how to recover the ESX host from incomplete upgrades.
Following section details the recovery steps for errors/exceptions encountered during the apply stage of remediation process.
Failure to start the patched daemon binary: This indicates that the daemon was not patched (remediation failure) and the remediation process rolled back the host to the earlier state by launching the unpatched version of the daemon successful. In this case the recommended action is:
Migrate the existing workloads using vMotion or stop all VMs on the host.
Perform a full system upgrade through the standard upgrade process.