Users experience unresponsive UI behavior in VMware Aria Operations after a successful login using local admin credentials.
Symptoms include:
Product UI: Hangs indefinitely, displaying the message "Redirecting to VMware Aria Operations Web UI..."
Admin UI: Hangs indefinitely, displaying the message "Retrieving cluster status..."
Users can successfully reach and authenticate at the login pages, but the interfaces fail to load further.
The production cluster displays a "loading" status.
VMware Aria Operations 8.18.5
Continuous Availability (CA) design enabled
The root cause is a race condition exclusive to the Peer-to-Peer (P2P) SSL handshake between cluster members, which results in a JVM deadlock.
This deadlock silently accumulates stuck threads over time. Once a significant number of threads become stuck (e.g., thousands over a span of several weeks), it causes peer nodes to crash, ultimately resulting in a loss of cluster quorum and unresponsive interfaces.
Permanent Fix: This issue is resolved in VMware Cloud Foundation (VCF) Operations 9.1, which includes an upgrade to GemFire 10.1.3 containing the permanent fix.
Workaround: If an immediate upgrade to VCF Ops 9.1 is not possible, the following workaround can be applied to stabilize the environment.
Note: This is only recommended if your environment's resource count fits within single-node sizing limits (e.g., ~3,000 resources).
Disable Continuous Availability (CA) and shrink the cluster to a single node.
Why this works: By running as a single-node deployment, there are no inter-node P2P connections. Without these connections, the P2P handshake reader threads are never spawned, making the deadlock structurally impossible at the architectural level.
Sizing considerations: A 3,000 resource workload is well within single-node sizing limits for standard hardware (32 GB RAM, 8 vCPU, 15 GB JVM heap). For added safety margin, the remaining node can be scaled up to a LARGE configuration.
Revert: The trade-off for this workaround is the temporary loss of high availability. CA should be re-enabled once the environment is upgraded to VCF Ops 9.1.