INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - received FindCoordinatorResponse(coordinator=XX.XX.XX.XX(51563:locator)<ec><v0>:20003, fromView=true, viewId=0, registrants=[XX.XX.XX.XX
(vRealize Ops Analytics-XX.XX.XX.XXXXX)<ec>:10010, XX.XX.XX.XX(vRealize Ops Analytics-XX.XX.XX.XX:52474)<ec>:10007, XX.XX.XX.XX(25595:locator)<ec>:20003], senderId=XX.XX.XX.XX(51563:locator)<ec><v0>:20003, network partition detection enabled=false, locators preferred as coordinators=false, view=View[XX.XX.XX.XX(51563:locator)<ec><v0>:20003|0] members: [XX.XX.XX.XX(51563:locator)<ec><v0>:20003]) from locator /XX.XX.XX.XX:6061
INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - Locator's address indicates it is part of a distributed system so I will not become membership coordinator on this attempt to join
INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - Unable to contact locator /XX.XX.XX.XX:6061: java.net.SocketException: Protocol not available (connect failed)
INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - findCoordinator chose XX.XX.XX.XX(51563:locator)<ec><v0>:20003 out of these possible coordinators: [XX.XX.XX.XX(51563:locator)<ec><v0>:2
0003]
INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - Unable to contact locator /XX.XX.XX.XX:6061: java.net.ConnectException: Connection refused (Connection refused)
INFO [Geode Failure Detection Server thread 1 tid=76] org.apache.geode.distributed.internal.membership.gms.Services.lambda$startTcpServer$3 - Started failure detection server thread on /XX.XX.XX.XX:10002.
INFO [Analytics Main Thread tid=18] org.apache.geode.distributed.internal.membership.gms.Services.findCoordinator - Unable to contact locator /XX.XX.XX.XX:6061: java.net.ConnectException: Connection refused (Connection refused)
ERROR [Analytics Main Thread] com.integrien.analytics.AnalyticsMain.createGemfireCache - Can not connect to gemfire: Problem starting up membership services
org.apache.geode.SystemConnectException: Problem starting up membership services
at org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:247) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:279) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:509) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:458) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:355) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.distributed.internal.InternalDistributedSystem$DefaultClusterDistributionManagerConstructor.create(InternalDistributedSystem.java:2975) ~[gemfire-core-10.0.1.jar:?]
Aria Operations 8.18.x
The firewall hardening script relies on the primary node's FQDN stored in the casa.db.script file (specifically the clusterMembership section). However, the script utilizes the dig command which fails to resolve the hostname if it's a short name instead of the FQDN. Aria Ops mandates the use of FQDNs for primary and replica nodes. The script is designed to work with FQDNs to ensure proper communication during firewall hardening.
1. Take snapshot of all the nodes.
2. Modify the /storage/db/casa/webapp/hsqldb/casa.db.script values to ensure the primary node information in the clusterMembership section uses the FQDN instead of the short hostname.
Incorrect: INSERT INTO CASA_DOCS VALUES('clusterMembership','{"onlineState":"ONLINE","cluster_name":"Name of cluster","is_ha_enabled":false,"ha_transition_state":null,"ca_state":"DISABLED","initialization_state":"NONE","remove_node_state":"NONE","document_version":20,"document_time":1731645460795,"online_state":"ONLINE","online_state_time":1731645460792,"online_state_reason":"","out_of_diskspace_slice":"","email":null,"cluster_members":[],"admin_slices":[],"installation_state":"DONE","fail_going_offline":false,"slices":{"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX":{"slice_uuid":"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX","pair_uuid":null,"is_admin_node":true,"ip_address":"Node Short name","preferred_addresses":{},"slice_name":"master","membership_state":null,"region":null}}}')
INSERT INTO VERSION VALUES(1,3)
Correct: INSERT INTO CASA_DOCS VALUES('clusterMembership','{"onlineState":"ONLINE","cluster_name":"
Name of cluster
","is_ha_enabled":false,"ha_transition_state":null,"ca_state":"DISABLED","initialization_state":"NONE","remove_node_state":"NONE","document_version":20,"document_time":1731645460795,"online_state":"ONLINE","online_state_time":1731645460792,"online_state_reason":"","out_of_diskspace_slice":"","email":null,"cluster_members":[],"admin_slices":[],"installation_state":"DONE","fail_going_offline":false,"slices":{"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX":{"slice_uuid":"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX","pair_uuid":null,"is_admin_node":true,"ip_address":"Node FQDN","preferred_addresses":{},"slice_name":"master","membership_state":null,"region":null}}}')INSERT INTO VERSION VALUES(1,3)
Primary Networking Requirements:
1.The primary and replica nodes must use a static IP address, or fully qualified domain name (FQDN) with a static IP address.
2.Data nodes can use dynamic host control protocol (DHCP).
3.You can successfully reverse-DNS all nodes to their FQDN, currently the node hostname.
4.Nodes deployed by OVF have their hostnames set to the retrieved FQDN by default.
5.All nodes, must be bidirectionally routable by IP address or FQDN.
6.Do not separate analytics cluster nodes with network address translation (NAT), load balancer, firewall, or a proxy that inhibits bidirectional communication by IP address or FQDN.