vRA 8 environment with configured vRLI integration goes down after a few days of uptime
search cancel

vRA 8 environment with configured vRLI integration goes down after a few days of uptime

book

Article ID: 336842

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • UI is not accessible and environment does not work properly
  • Integration with vRealize Log Insight is configured but logs are not received
  • Before the system goes down, you see increased load in terms of CPU and memory usage
  • Running the command "kubectl -n prelude logs -l app=provisioning-service-app" displays error similar to: "Error from server (InternalError): Internal error occurred: Authorization error (user=kube-apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)"
  • Executing "service kubelet restart" or restarting the nodes temporarily resolves the issue
  • Examining the processes on the VA with the top utility and ordering the processing by memory usage reveals a ruby process that consumes a few GB of memory
  • Examining the logs of the symphony-logging pods show a lot of exceptions similar to: 2020-05-30 11:35:54 +0000 [warn]: #0 Net::HTTP.Post raises exception: ...


Environment

VMware vRealize Automation 8.x

Cause

This issue occurs when the integration with vRealize Log Insight is wrongly configured. 
Typically,
  • A secure connection is used and the server uses a self-signed certificate, but neither the CA certificate was provided when the integration was configured, nor the --insecure option was used
  • A wrong server port is specified.
This prevents logs from reaching the vRealize Log Insight server and builds a growing queue of logs that are to be sent. Since this queue is kept in memory, the logging agent will at some point consume all of the RAM of the VA and will cause system services to malfunction.

Resolution

To resolve this issue, fix the configuration.

Follow these steps to fix the configuration:

1. If the environment is currently not working, restart the kubelet by running below command and wait for a few minutes:
service kubelet restart
Note: This step should be done on all nodes of the environment.

2. On one of the nodes, use the "vracli vrli set" command to reconfigure the integration. Refer to vRealize Automation documentation or execute vracli vrli set --help to see an overview of all options. This step needs to be executed on a single node only.

Highlights:
  • Unless specifically instructed, configuration will default to HTTPS connection on port 443 and a trusted certificate will be assumed.
  • Refer to vRealize Log Insight documentation to see what ports are used. Typically, you should use port 9000 for HTTP or port 9543 for HTTPS
  • To trust an untrusted certificate, you need to provide the certificate of the CA that signed it via the "--ca-file" or "--ca-cert" command line options. For self-signed certificates, the CA is the self-signed certificate itself
  • To pass certificate validation, the certificate must be valid for the hostname of your vRealize Log Insight server (e.g. the CN should match the hostname or the certificate should include a SAN)
  • Alternatively, you can disable SSL verification (while still using a secured HTTPS connection) by passing the "--insecure" flag.
Note: This fully disables the verification of the remote host and will trust any identity. This should not be done in production.

Examples:
# secure connection over HTTPS with untrusted certificate
vracli vrli set --ca-file ca.crt https://my-vrli.local:9543

# secure connection over HTTPS with SSL verification disabled
vracli vrli set --insecure https://my-vrli.local:9543

# insecure connection over HTTP
vracli vrli set http://my-vrli.local:9000

3. Optionally, verify your configuration now looks fine by executing the following command:
vracli vrli

4. Wait for a few minutes. It should take up to 2 minutes for the configuration changes to be applied on all nodes. Verify that logs are received in vRealize Loginsight