Enabling firewall hardening in Aria Operations disrupts communication within the cluster and it gets struck in "Going Online" mode.
search cancel

Enabling firewall hardening in Aria Operations disrupts communication within the cluster and it gets struck in "Going Online" mode.

book

Article ID: 382093

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Enabling firewall hardening ( activated Firewall Hardening via "Admin UI > Administrator Settings > Firewall Hardening") in Aria Operations to mitigate any vulnerabilities disrupts communication within the cluster. 
  • The "gemfire_vRealize Ops AnalyticsXXXX.log" file contains messages similar to: 
    INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - received FindCoordinatorResponse(coordinator=XX.XX.XX.XX(51563:locator)<ec><v0>:20003, fromView=true, viewId=0, registrants=[XX.XX.XX.XX
    (vRealize Ops Analytics-XX.XX.XX.XXXXX)<ec>:10010, XX.XX.XX.XX(vRealize Ops Analytics-XX.XX.XX.XX:52474)<ec>:10007, XX.XX.XX.XX(25595:locator)<ec>:20003], senderId=XX.XX.XX.XX(51563:locator)<ec><v0>:20003, network partition detection enabled=false, locators preferred as coordinators=false, view=View[XX.XX.XX.XX(51563:locator)<ec><v0>:20003|0] members: [XX.XX.XX.XX(51563:locator)<ec><v0>:20003]) from locator /XX.XX.XX.XX:6061
    INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - Locator's address indicates it is part of a distributed system so I will not become membership coordinator on this attempt to join
    INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - Unable to contact locator /XX.XX.XX.XX:6061: java.net.SocketException: Protocol not available (connect failed)
    INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - findCoordinator chose XX.XX.XX.XX(51563:locator)<ec><v0>:20003 out of these possible coordinators: [XX.XX.XX.XX(51563:locator)<ec><v0>:2
    0003]
    INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - Unable to contact locator /XX.XX.XX.XX:6061: java.net.ConnectException: Connection refused (Connection refused)
    INFO [Geode Failure Detection Server thread 1 tid=76] org.apache.geode.distributed.internal.membership.gms.Services.lambda$startTcpServer$3 - Started failure detection server thread on /XX.XX.XX.XX:10002.
    INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - Unable to contact locator /XX.XX.XX.XX:6061: java.net.ConnectException: Connection refused (Connection refused)
  • Also the data node's "analytics.log" file contains error similar to: 
    ERROR [Analytics Main Thread]  com.integrien.analytics.AnalyticsMain.createGemfireCache - Can not connect to gemfire: Problem starting up membership services
    org.apache.geode.SystemConnectException: Problem starting up membership services
            at org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:247) ~[gemfire-core-10.0.1.jar:?]
            at org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:279) ~[gemfire-core-10.0.1.jar:?]
            at org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:509) ~[gemfire-core-10.0.1.jar:?]
            at org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:458) ~[gemfire-core-10.0.1.jar:?]
            at org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:355) ~[gemfire-core-10.0.1.jar:?]
            at org.apache.geode.distributed.internal.InternalDistributedSystem$DefaultClusterDistributionManagerConstructor.create(InternalDistributedSystem.java:2975) ~[gemfire-core-10.0.1.jar:?]

       

 

Environment

Aria Operations 8.18.x

Cause

The firewall hardening script relies on the primary node's FQDN stored in the casa.db.script file (specifically the clusterMembership section). However, the script utilizes the dig command which fails to resolve the hostname if it's a short name instead of the FQDN. Aria Ops mandates the use of FQDNs for primary and replica nodes. The script is designed to work with FQDNs to ensure proper communication during firewall hardening.

Getting Started with VMware Aria Operations (8.18)
 

Resolution

1. Take snapshot of all the nodes.

2. Modify the /storage/db/casa/webapp/hsqldb/casa.db.script values to ensure the primary node information in the clusterMembership section uses the FQDN instead of the short hostname.

  • Modify each vRealize Operations Manager node that requires the update of casa.db.script values.

Incorrect:  
INSERT INTO CASA_DOCS VALUES('clusterMembership','{"onlineState":"ONLINE","cluster_name":"Name of cluster","is_ha_enabled":false,"ha_transition_state":null,"ca_state":"DISABLED","initialization_state":"NONE","remove_node_state":"NONE","document_version":20,"document_time":1731645460795,"online_state":"ONLINE","online_state_time":1731645460792,"online_state_reason":"","out_of_diskspace_slice":"","email":null,"cluster_members":[],"admin_slices":[],"installation_state":"DONE","fail_going_offline":false,"slices":{"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX":{"slice_uuid":"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX","pair_uuid":null,"is_admin_node":true,"ip_address":"Node Short name","preferred_addresses":{},"slice_name":"master","membership_state":null,"region":null}}}')
INSERT INTO VERSION VALUES(1,3)

Correct:  
INSERT INTO CASA_DOCS VALUES('clusterMembership','{"onlineState":"ONLINE","cluster_name":"Name of cluster","is_ha_enabled":false,"ha_transition_state":null,"ca_state":"DISABLED","initialization_state":"NONE","remove_node_state":"NONE","document_version":20,"document_time":1731645460795,"online_state":"ONLINE","online_state_time":1731645460792,"online_state_reason":"","out_of_diskspace_slice":"","email":null,"cluster_members":[],"admin_slices":[],"installation_state":"DONE","fail_going_offline":false,"slices":{"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX":{"slice_uuid":"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX","pair_uuid":null,"is_admin_node":true,"ip_address":"Node FQDN","preferred_addresses":{},"slice_name":"master","membership_state":null,"region":null}}}')
INSERT INTO VERSION VALUES(1,3)

Additional Information

Primary Networking Requirements:

1.The primary and replica nodes must use a static IP address, or fully qualified domain name (FQDN) with a static IP address.
2.Data nodes can use dynamic host control protocol (DHCP).
3.You can successfully reverse-DNS all nodes to their FQDN, currently the node hostname.
4.Nodes deployed by OVF have their hostnames set to the retrieved FQDN by default.
5.All nodes, must be bidirectionally routable by IP address or FQDN.
6.Do not separate analytics cluster nodes with network address translation (NAT), load balancer, firewall, or a proxy that inhibits bidirectional communication by IP address or FQDN.