Cloud Proxies offline after upgrade - Aria Operations 8.x
search cancel

Cloud Proxies offline after upgrade - Aria Operations 8.x

book

Article ID: 369174

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Following the upgrade to Aria Operations 8.16 you notice that all Cloud Proxies go offline after ~2-3 hours. A reboot of the Cloud proxies resolves the issue and brings them back online. However after another ~2-3 hours the Cloud Proxies go offline again.

  • Fresh install of 8.17.x also faces the same issue.
  • In the file: /storage/log/var/log/haproxy-admin.log You see entries similar to:

    2024-01-01T01:01:01+00:00 localhost haproxy[0000]: Server PrxyRC_UNSECURE_BE/VROPS_N is going DOWN for maintenance (unspecified DNS error). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
    2024-01-01T01:01:01+00:00 localhost haproxy[0000]: backend PrxyRC_UNSECURE_BE has no server available!

Environment

Aria Operation 8.16.x

Aria Operation 8.17.x

Resolution

  1. Copy the file attached to this article (cp_dns_resolver.py) to the /tmp directory on each Cloud Proxy which is having the issue.
  2. SSH to the Cloud Proxy as the root user.
  3. Set permissions on on the file with the following command:

    chmod a+x /tmp/cp_dns_resolver.py


  4. Run the script with the following command:

    python /tmp/cp_dns_resolver.py

After the script completes, you should then see the Cloud Proxies go online shortly afterwards.
This change will be persisted following upgrade.

Note:
If the Cloud Proxies still fail to come online ensure that you are able to resolve the FQDN of each Aria Operations Node from the Cloud Proxy.
This can be checked from the Cloud Proxy cli with the following command, replacing <Node_FQDN>:

nslookup <Node_FQDN>

It is a networking requirement that Cloud Proxies have proper DNS resolution to the VMware Aria Operations nodes when using short/long FQDN names.

Attachments

cp_dns_resolver.py get_app