Slow UI performance and unresponsiveness in VMware Aria Automation 8.18.x
search cancel

Slow UI performance and unresponsiveness in VMware Aria Automation 8.18.x

book

Article ID: 433161

calendar_today

Updated On:

Products

VCF Automation

Issue/Introduction

Severe degradation in UI performance, slow provisioning times, or general unresponsiveness in VMware Aria Automation. You may also observe high CPU/memory consumption or high load averages on one or more appliance nodes.

When reviewing the system state and logs, you observe the following:

  • The network-health-monitor logs (kube-system/network-health-monitor/console-logs/pingcheck.log) show high execution times (5+ seconds) and significant packet loss (e.g., 20%) to internal node IPs.

  • The journalctl logs show a flood of Kubernetes API timeouts:

kube-apiserver[<PID>]: Error on socket receive: read tcp 127.0.0.1:6443->127.0.0.1:<PORT>: use of closed network connection
  • The netstat-anp.log shows a massive Receive Queue (Recv-Q) and hundreds of connections stuck in a CLOSE_WAIT state tied to the ruby process.

  • The /var/log/loginsight-agent/liagent_*.log shows the following transport error:

CurlConnection:707 | Transport error while trying to connect to <log insight server> SSL peer certificate or SSH remote key was not OK

Environment

Aria Automation 8.18.x

Cause

The SSL certificate on the destination VMware Aria Operations for Logs server is replaced, expires, or becomes untrusted by the VMware Aria Automation appliance.

When the certificate is rejected, the local agent (fluentd/ruby process) on the VMware Aria Automation appliance fails to forward logs. The agent's internal buffers fill to capacity, and the ruby process gets stuck in an infinite retry loop, often hanging for 40 seconds per attempt. This loop hoards the appliance's CPU and memory resources, severely congesting the internal Kubernetes overlay network and causing internal microservices to time out.

Resolution

Workaround (Immediate Mitigation) To immediately stop the ruby process from hoarding resources and restore UI performance while you prepare to apply the permanent resolution, you can temporarily remove the broken integration.

  1. SSH into the affected VMware Aria Automation appliance as root.

  2. Run the following command:

vracli vrli unset
  1. Wait approximately 60 seconds. The internal network packet loss will stop, and UI performance will return to normal.

To permanently resolve the issue and restore log forwarding, you must reconfigure the integration so the VMware Aria Automation appliance can accept the new SSL certificate.

  1. SSH into the affected VMware Aria Automation appliance as root.

  2. Run the following command:

vracli vrli set https://<YOUR-LOG-INSIGHT-FQDN-OR-IP>:9543
  1. The system reaches out to the server, presents the new certificate, and prompts you:

Do you trust this certificate? [y/N]
  1. Type y and press Enter.

Note: If strict SSL validation is not required or the prompt is being blocked by a load balancer, the --insecure flag can be appended to the set command to bypass certificate validation.