Errors with Xenon/Container Service in vRealize Automation 7.3 HA environment
search cancel

Errors with Xenon/Container Service in vRealize Automation 7.3 HA environment

book

Article ID: 342465

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • In a vRA clustered environment, the container service might not respond when docker hosts are removed after Xenon service is running for some days.
  • The /var/log/vmware/xenon log file consumes all disk space and contains log entries similar to:

    [validateStageTransitionAndState][Moving from STARTED(REQUEST_FAILED) to STARTED(REQUEST_FAILED).]
    [lambda$synchronizeChildrenInQueryPage$5][Synchronization failed for service {service-resource} with status code 404, message Service https://{address}:8494/{service-resource} returned error 404 for {method}. id {opId} message Service not found: http://127.0.0.1/{service-resource}]
    [checkAndCompleteOperation][(Original id: {opId}) Replication request to https://{address}:8494/{service-resource}-{method} failed with 500, Service https://{address}:8494/{service-resource} returned error 500 for {method}. id {opId} message queue limit exceeded] [lambda$handleServiceNotFoundOnReplica$5][Service {service-resource} not found on replica. Retrying replication request ..

     
  • When one or more nodes are restarted, you see some inconsistencies similar to:
     
    • Inconsistent data can be collected, where some of the Docker host containers might not be discovered.
    • Inconsistent data can be displayed depending on what node the UI is (internally) requesting the data to.
 
 


Environment

VMware vRealize Automation 7.x
VMware vRealize Automation 7.0.x
VMware vRealize Automation 7.3.x
VMware vRealize Automation 7.1.x
VMware vRealize Automation 7.2.x
VMware vRealize Automation 6.2.x
VMware vRealize Automation Desktop 6.2
VMware vRealize Automation Desktop 6.2.x
VMware vRealize Automation 6.x
VMware vRealize Automation 6.2
VMware vRealize Automation 7.4.x

Cause

This issue occurs due to issues during the setup of the Xenon cluster and in the clustering implementation itself.

Resolution

This is a known issue affecting VMware vRealize Automation 7.3.0.

This issue is resolved in VMware vRealize Automation 7.3.1 and VMware vRealize Automation 7.4, available at VMware Downloads


Workaround:
To resolve this issue in VMware vRealize Automation 7.3.0, apply the patch 2150912_patch.zip attached to this KB article. A backup of all container related data is created automatically by the patch script, no manual actions are required for backup.
 
To apply the patch:
  1. Download the 2150912_patch.zip file and add it to any active vRA appliances.

    Note: This does not include virtual appliances used for Code Stream or vRO.
     
  2. Extract the 2150912_patch.zip file to get the patch.sh script.
     
  3. Copy the patch.sh script to a working directory on each vRealize Automation node.
     
  4. Add the execute permissions to the script:
     
    1. Update the owner of the file as root by running this command:

      chown -R root <patch file with directory path>
       
    2. Change the file permissions to 744 by running this command:

      chmod 744 <patch file with directory path>

      NOTE: Replace <patch file with directory path> with the full directory path of patch.sh script.
       
  5. Execute bash patch.sh sequentially on each node.

    NOTE: Do not execute the script in parallel on all nodes.
     
  6. If output of the patch execution reports:

    Node will not start. Available node detected but it is not responsive yet. Try again later.

    Execute the patch on the other node(s) and start Xenon service manually once the patch execution succeeded on other nodes:


    service xenon-service start

Rollback steps:

To rollback, restore the /etc/xenon directory from the backup archive created automatically by the script.
 


Additional Information

These fixes are scheduled in a future release and not required to reapply the patch for future versions.
 
Steps that are executed by this patch script:
  1. A backup archive of all container related data is created in /tmp directory.
  2. The necessary files are extracted to a temporary folder and then an installer script is invoked.
  3. At first the Xenon service instance is stopped, then necessary files are copied and then Xenon is started again.
  4. The temporary folder is deleted from the system.


Attachments

2150912_patch.zip get_app