Aria Operations Cloud Proxy nodes intermittently flap between Online and Offline status due to DNS resolution timeouts
search cancel

Aria Operations Cloud Proxy nodes intermittently flap between Online and Offline status due to DNS resolution timeouts

book

Article ID: 432767

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

VMware Aria Operations Cloud Proxy (CP) appliances may exhibit "flapping" behavior, where the connection status alternates between Online and Offline in the UI. This typically occurs in remote sites where DNS services are unavailable or unreachable.

The following log entries are observed:

  • /storage/log/vcops/log/vrops-init.log

    reverse_lookup-INFO: STDOUT for the command /usr/bin/host -W 15 -R 1 -T ***.***.***.***: ;; Connection to ::1#53(::1) for ***.***.***.***.in-addr.arpa. failed: connection refused.
    ;; no servers could be reached
    reverse_lookup-WARNING: Unable to get an FQDN for the ***.***.***.*** IP address

  • /storage/log/var/log/haproxy-admin.log

    localhost haproxy[6657]: Server PrxyRC_BE/VROPS_2 is going DOWN for maintenance (DNS timeout status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
    localhost haproxy[6657]: backend PrxyRC_BE has no server available!

  • (Primary Node): /storage/log/vcops/log/analytics-<UUID>.log
    INFO  [DistTaskCollectorStatusChecker]  com.integrien.alive.common.communication.CollectorStatusChecker.checkRebalancingState - Did not receive heartbeat from CP_NAME collector 601579 milliseconds

Environment

Aria Operations 8.18.x

Cause

Aria Operations Cloud Proxy uses HAProxy to manage connections. By default, HAProxy queries DNS servers directly. If these servers are unreachable, HAProxy fails to resolve node FQDNs, marking backends as DOWN even if the /etc/hosts file is populated

Resolution

The official recommendation is to provide reachable DNS servers to the site. Refer to the Aria Operations Cluster Network Requirements

Workaround

If DNS cannot be implemented, follow these steps to force local name resolution on Cloud proxy:

Step 1: Implement Local Name Resolution

  1. SSH to the Cloud Proxy as root.
  2. Edit /etc/hosts: vi /etc/hosts
  3. Add IP addresses and FQDNs of all Aria Operations cluster nodes.
  4. Verify resolution: getent hosts <FQDN>

Step 2: Modify HAProxy to Prioritize Local Hosts

  1. Backup the configuration: cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orig
  2. Edit /etc/haproxy/haproxy.cfg: vi /etc/haproxy/haproxy.cfg
  3. Locate server lines and change resolvers resolvernameservers init-addr last,libc,none to init-addr libc,none.
  4. Comment out the entire resolvers block.
  5. Restart HAProxy: systemctl restart haproxy.

Step 3: Ensure Persistency Across Reboots To prevent haproxy.cfg from being reverted during lifecycle operations:

chattr +i /etc/haproxy/haproxy.cfg

Note: You must run chattr -i /etc/haproxy/haproxy.cfg before making future changes to this file.