Aria Operations cluster fails to go to 'OFFLINE' state.
search cancel

Aria Operations cluster fails to go to 'OFFLINE' state.

book

Article ID: 376399

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

  • Unable to take cluster offline and it goes into 'FAILURE' state with following error:
    Failed to go offline
  • To bring down the cluster forcefully, need to select option "Force Take Cluster Offline".

Environment

Aria Operations 8.12.1

Cause

Analytics service failing to stop within 5 minutes timeout causing cluster to go into 'FAILURE' state.

casa.log

2024-08-06T13:16:30,985+0000 ERROR [pool-4-thread-2518] [6Q0079A5] casa.suiteapi.SuiteApiInternalService:453 - Exception calling suite API GET casa/clusters/prepare-cluster-services-for-shutdown; Request Id null: org.springframework.web.client.ResourceAccessException: I/O error on GET re
quest for "https://localhost/suite-api/internal/casa/clusters/prepare-cluster-services-for-shutdown": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out

2024-08-06T13:16:30,985+0000  WARN [pool-4-thread-2518] [6Q0079A5] sysadmin.online.OnlineStateService:869 - OFFLINE-WORKFLOW: Suite REST API failed to prepare cluster services for shutdown: com.vmware.vcops.casa.exception.CasaSuiteApiException: Error Calling suite-api
..
..
..
2024-08-06T13:16:30,985+0000  INFO [pool-4-thread-2518] [6Q0079A5] sysadmin.online.OnlineStateService:1621 - Updating cluster online state to FAILURE

 

 

Resolution

To kill the analytics process forcefully after 4min 50sec, which is within the 5min timeout before it requires a "Force Take Cluster Offline" option.

Please follow below instructions:

  • Take ssh session to Primary node using root user.
  • Run below command to take a backup of advanced.properties file.
    cp  /usr/lib/vmware-vcops/user/conf/analytics/advanced.properties /usr/lib/vmware-vcops/user/conf/analytics/advanced.properties.backup
  • Edit advanced.properties file.
    vi /usr/lib/vmware-vcops/user/conf/analytics/advanced.properties
  • Add following line in the last.
    analyticsServiceShutdownTimeout = 290000
  • Save the file.
    :wq!
  • Repeat same steps on other analytics nodes.