Kubernetes nodes stop responding in vRA / vRO 8.2.x environments after running for an extended period of time
search cancel

Kubernetes nodes stop responding in vRA / vRO 8.2.x environments after running for an extended period of time

book

Article ID: 326118

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

This article applies to the following products:
  • vRA 8.2 GA
  • vRO 8.2 GA
  • vRA 8.2 Patch 1
  • vRO 8.2 Patch 1
This script clears contents older than 1 hour from /var/vmware/prelude/service/monitor/services.

Symptoms:
  • Kubernetes nodes stop responding in systems that have been running for an extended period of time
  • The size of the directory /var/vmware/prelude/service/monitor/services/ in the affected nodes is greater then 15GB
    du -d0 -h /var/vmware/prelude/service/monitor/services/
  • vracli service status reports that all services are Stopped.


Environment

VMware vRealize Orchestrator 8.2.x
VMware vRealize Automation 8.2.x

Cause

Data from old results may remain during the monitoring of services.

Resolution

This issue is resolved in vRealize Automation 8.3 or newer.

Workaround:

Prerequisites

  • Please take simultaneous non-memory snapshots of each virtual appliance(s) in the cluster.
  • You have access to root user and password
  • You have SSH or console access to each virtual appliance.
Procedure
  1. SSH / PuTTy into one vRA virtual appliance in the cluster
  2. Run the following command
vracli cluster exec -- bash -c "echo -e 'IyEvYmluL2Jhc2gKCmZpbmQgL3Zhci92bXdhcmUvcHJlbHVkZS9zZXJ2aWNlL21vbml0b3Ivc2VydmljZXMvIC10eXBlIGQgLWNtaW4gKzYwIC1ub3QgLXBhdGggL3Zhci92bXdhcmUvcHJlbHVkZS9zZXJ2aWNlL21vbml0b3Ivc2VydmljZXMvY2FjaGUgfCB4YXJncyBybSAtcmYK' | base64 -d > /etc/cron.daily/cleanup_obsolete_svcmon_cache.sh && chmod 700 /etc/cron.daily/cleanup_obsolete_svcmon_cache.sh"