When deploying a new VCF Operations cluster or expanding an existing one with High Availability (HA) or Continuous Availability (CA), you may experience the following symptoms:
/storage/vcops/log/analytics-<UUID>.log) show errors such as: com.integrien.analytics.AnalyticsMain.createGemfireCache - Can not connect to gemfire: Problem starting up membership services. VCF Operations 9.x
This issue is typically caused by incomplete network connectivity between the cluster nodes. VCF Operations requires specific internal ports to be open for GemFire membership services and data synchronization, even if the nodes reside on different VLANs or subnets
Ensure the network environment and appliance configuration meet the following requirements:
Work with your network team to ensure that the following ports are open between the nodes in the cluster (Note: Primary / Replica nodes are also classified as Data Nodes):
Run the internal Netcheck.py script from the appliance CLI to identify specific connection failures:
/usr/bin/python /usr/lib/vmware-vcopssuite/python/lib/Netcheck.py
If any ports return a FAILED status, the firewall or routing configuration must be adjusted.
Ensure that all nodes have forward (A) and reverse (PTR) records configured in your DNS server. Missing PTR records often lead to the Waiting for Analytics state as nodes fail to resolve each other during the join process.
If nodes are deployed across different IP networks or VLANs: