Container related errors occurs after restarting one or more clustered vRA nodes

search cancel

book

calendar_today

VMware

Symptoms:

In a vRealize Automation 7.2 clustered environment, when one or more nodes are restarted, the Container service misbehave with symptoms:

Inconsistent data collected where some of the Docker host containers are not discovered. It is possible to have situation when on a Docker host there are 10 containers available and in Admiral not all of them are discovered and visible.
Randomly failing requests with message similar to:

javax.net.ssl.SSLHandshakeException: General SSLEngine problem.
Inconsistent data displayed depending on what node the UI is (internally) requesting the data to.
The containers tab does not load properly.

This issue occurs due to a problem in the clustering of the VMware Admiral instance on the vRealize Automation hosts.

This issue is resolved in vRealize Automation 7.3, available at VMware Downloads.

To work around this issue if you do not want to upgrade:

Caution: Do not execute the script in parallel on all nodes.

Steps that are executed by this patch script:

A backup archive of all container related data is created in /tmp directory.
The necessary files are extracted to a temporary folder and then an installer script is invoked.
At first the xenon service instance is stopped, the necessary files are copied and then Xenon is started back again. Finally the temporary folder is deleted from the system.

Patch Output:

If output of the patch execution appears as :

Node will not start. Available node detected but it is not responsive yet. Try again later.

Execute the patch on other node(s) and start xenon service manually once patch execution succeeded on other nodes.

The command to start the service is:

service xenon-service start

Backup steps:

A backup of all container related data is created automatically by the script. No manual actions are required.

Rollback steps:

Restore the /etc/xenon directory from the backup archive created automatically by the script.

To be alerted when this document is updated, click the Subscribe to Article link in the Actions box.

中文简体：重新启动一个或多个群集 vRA 节点后发生与容器相关的错误

KB2148212_patch.zip get_app

thumb_up Yes

thumb_down No