VMware Aria Operations deployment fails with Error LCMVROPSYSTEM25001
search cancel

VMware Aria Operations deployment fails with Error LCMVROPSYSTEM25001

book

Article ID: 439085

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

When deploying or expanding a VMware Aria Operations cluster using VMware Aria Suite Lifecycle (LCM), the request fails at the "Initializing Cluster" or "Cluster Configuration" stage.

  • Error Code: LCMVROPSYSTEM25001

  • Error Message: VMware Aria Operations initializing cluster failure. VMware Aria Operations cluster configurations failed.

  • The Aria Operations admin UI may show a status of "Failed" or remain stuck at "Waiting for Analytics."

  • In /storage/log/vcops/log/analytics/Analytics-<UUID>.log you can see the following errors indicating communication breakdowns between nodes:

    Port 6061 (Locator) Error:

    INFO [ajp-nio-127.0.0.1-8010-exec-4, Info] client.internal.AutoConnectionSourceImpl - locator /<Node-IP>:6061 is not running.
    java.net.ConnectException: Connection refused (Connection refused)
    

    Port 10008 (Internal Data Communication) Error:

    WARN  [Membership Messenger Sender Non-Blocking]  com.vmware.gemfire.tcpmessenger.internal.ClientHandler.exceptionCaught - Asynchronous Messaging Client (local addy: /<Source-IP>:59268, remote addy: /<Destination-IP>:10008) got an I/O exception communicating with server: java.io.IOException: Connection reset by peer

Environment

  • VMware Aria Operations 8.18.x

  • VMware Aria Suite Lifecycle 8.x

Cause

This issue is typically caused by environmental restrictions preventing the nodes from communicating or synchronizing correctly:

  1. Network/Port Restrictions: Firewalls or network security groups are blocking internal communication ports between the nodes, resulting in "Connection refused" or connection timeouts in the logs.

  2. Time Synchronization (NTP): A clock skew between nodes (typically >60 seconds) causes JWT (JSON Web Token) authentication failures. The logs will show VcopsJwtAuthenticationFilter errors indicating the token has expired or is not yet valid.

Resolution

To resolve this issue, you must ensure that all nodes can communicate over the required ports and share a synchronized clock.

Step 1: Verify and Open Required Network Ports Ensure that your network allows unrestricted communication between all vROps nodes (Primary, Replica, and Data nodes).

  • Test specific connectivity from the node console using the curl command. For example:

    • curl -v <Target-Node-IP>:6061

    • curl -v <Target-Node-IP>:10008

  • Important: While ports 6061 and 10008 are critical for GemFire cluster membership, vROps requires a wider range of ports for full functionality. Please ensure your firewalls are configured according to the master port list found in: TCP and UDP ports required to access VMware vRealize Operations Manager.

Step 2: Correct NTP Synchronization

  1. Verify that all deployed nodes (as well as the Aria Suite Lifecycle appliance) are configured to use the same, reachable NTP server.

  2. Check the time on all nodes via the command line to ensure there is zero or near-zero clock skew.

  3. Address any Receive timed out errors between the vROps appliances and the NTP server.

Once the network blockages are removed and the time is synchronized, retry the deployment or cluster expansion workflow from Aria Suite Lifecycle.

Additional Information

  • Alternative Deployment: If the issue persists due to intractable network policies in a specific vCenter Cluster/Port Group, attempt deployment in an alternative, verified network environment to rule out software corruption.