Getting the error "UNKNOWN_SYSTEM_ERROR" when deploying new worker nodes to existing VCF Operations for Logs
search cancel

Getting the error "UNKNOWN_SYSTEM_ERROR" when deploying new worker nodes to existing VCF Operations for Logs

book

Article ID: 431428

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

When attempting to expand an existing VCF Operations for Logs cluster by adding new worker nodes, the deployment process fails with an error: UNKNOWN_SYSTEM_ERROR

Environment

VCF Operations for Logs 9.0.x

Cause

Network connectivity issue between the existing primary node and the (new) deployed worker nodes will cause the error mentioned above.
 
Reviewing the underlying logs, a hard network timeout is recorded during the cluster-join operation.
 
In FIPS-enabled environments, internal node-to-node communication relies on specific secure ports. If a firewall or network appliance drops or blocks this traffic, the Primary node cannot reach the worker, resulting in the connection timeout and subsequent failure event.
 
/var/log/vrlcm/vrlcm.log:
ERROR vrlcm[####] [pool-#-thread-##] [c.v.v.l.p.v.AddVRLINodeToMasterNodeTask]  -- Exception while adding VMware Aria Operations for Logs worker to master
java.net.ConnectException: Connection timed out (Connection timed out)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
...
INFO vrlcm[####] [pool-#-thread-##] [c.v.v.l.p.a.s.Task]  -- Injecting task failure event. Error Code : 'UNKNOWN_SYSTEM_ERROR'

Resolution

To resolve this issue, one must correct the network or firewall configuration blocking the intra-cluster communication.
 
Network and Firewall Review:
 
The connection timeout indicates that traffic is failing between the nodes. You must work with your network or security team to adjust the internal firewall rules.
 
Ensure that TCP Port 16520 and TCP Port 16521 (VCF Operations for Logs Thrift service) are fully open for bidirectional communication between the Primary node and all newly deployed worker nodes.

Additional Information

Preliminary Validations
 
Before altering network rules, ensure the Operations for Logs deployment is healthy:
  • Disk Space: Verify that sufficient storage capacity (df -h) exists across all existing and new nodes.
  • Certificates: Confirm that both internal and external certificates are valid, unexpired, and functioning correctly.