Kubernetes nodes stop responding in vRA / vRO 8.2.x environments after running for an extended period of time

search cancel

Kubernetes nodes stop responding in vRA / vRO 8.2.x environments after running for an extended period of time

book

Article ID: 326118

calendar_today

Updated On: 01-24-2022

Products

VMware Aria Suite

Issue/Introduction

This article applies to the following products:

vRA 8.2 GA
vRO 8.2 GA
vRA 8.2 Patch 1
vRO 8.2 Patch 1

This script clears contents older than 1 hour from /var/vmware/prelude/service/monitor/services.

Symptoms:

Kubernetes nodes stop responding in systems that have been running for an extended period of time
The size of the directory /var/vmware/prelude/service/monitor/services/ in the affected nodes is greater then 15GB
```
du -d0 -h /var/vmware/prelude/service/monitor/services/
```
vracli service status reports that all services are Stopped.

Environment

VMware vRealize Orchestrator 8.2.x
VMware vRealize Automation 8.2.x

Cause

Data from old results may remain during the monitoring of services.

Resolution

This issue is resolved in vRealize Automation 8.3 or newer.

Workaround:

Prerequisites

Please take simultaneous non-memory snapshots of each virtual appliance(s) in the cluster.
You have access to root user and password
You have SSH or console access to each virtual appliance.

Procedure

SSH / PuTTy into one vRA virtual appliance in the cluster
Run the following command

vracli cluster exec -- bash -c "echo -e 'IyEvYmluL2Jhc2gKCmZpbmQgL3Zhci92bXdhcmUvcHJlbHVkZS9zZXJ2aWNlL21vbml0b3Ivc2VydmljZXMvIC10eXBlIGQgLWNtaW4gKzYwIC1ub3QgLXBhdGggL3Zhci92bXdhcmUvcHJlbHVkZS9zZXJ2aWNlL21vbml0b3Ivc2VydmljZXMvY2FjaGUgfCB4YXJncyBybSAtcmYK' | base64 -d > /etc/cron.daily/cleanup_obsolete_svcmon_cache.sh && chmod 700 /etc/cron.daily/cleanup_obsolete_svcmon_cache.sh"

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No