Log Insight service keeps crashing with error " [Fatal error:]java.lang.StackOverflowError: null"
search cancel

Log Insight service keeps crashing with error " [Fatal error:]java.lang.StackOverflowError: null"

book

Article ID: 420797

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Log Insight service keeps crashing. Stopping and starting the service resumes the functionality however the service crashes again after a day or two.
  • In /var/log/loginsight//runtime.log of Aria Operations for Logs nodes, you see similar entries containing large number of queries before "java.lang.StackOverflowError": 

    ["LogSearchWorker-thread-1"/IP_address INFO] [com.vmware.loginsight.analytics.distributed.LogSearchWorkerService] [Received query: SELECT item0 FROM timestamp >= EpochTimestamp AND timestamp <= EpochTimestamp AND ((text:"aaa-2-*" OR text:"aam-2-*" OR text:"acllog-2-*" OR text:"aclmgr-2-*" OR text:"aclqos-2-*" OR text:"acltcam-2-*" OR text:"acl-2-*" OR text:"agni-dpp-2-*" OR text:"agnil2f-2-*
    " OR text:"ali_lc-2-*" OR text:"amm-2-*" OR text:"\"app emulator-2-*\"" OR text:"arbiter-2-*" OR text:"ascii-cfg-2-*" OR text:"\"assoc mgr-2-*\"" OR text:"atlantis_app-2-*" OR text:"azuma-2-*" OR text:"bet-2-*" OR text:"bfdc-2-*" OR text:"bfd-2-*" OR text:"\"bios daemon-2-*\"" OR text:"bloggerd-2-*" OR text:"\"bootup test-2-*\"" OR text:"bootvar-2-*" OR text:"callhome-2-*" OR text:"cardclient-2-*" OR text:"cdp-2-*" OR text:
    "cert_enroll-2-*" OR text:"cfgd-2-*" OR text:"cfs-2-*" OR text:"cimsrvprov-2-*" OR text:"clis-2-*" OR text:"clk_mgr-2-*" OR text:"cloud-2-*" OR text:"clp_fwd-2-*" OR text:"clp_l3-2-*" OR text:"clp_mac-2-*" OR text:"clp_xbar-2-*" OR text:"cluster_test_app-2-*" OR text:"cluster-2-*" OR text:"cmond-2-*" OR text:"cmpproxy-2-*" OR text:"copp-2-*" OR text:"core-dmon-2-*" OR text:"crdcfg-2-*" OR text:"creditmon-2-*" OR text:"cts-2
    -*" OR text:"dcefib-2-*" OR text:"debugproxy-2-*" OR text:"device-alias-2-*" OR text:"device_test-2-*" OR text:"dev_log_sup-2-*" OR text:"dev_log-2-*" OR text:"dftm-2-*" OR text:"dhcp_snoop-2-*" OR text:"diagclient-2-*" OR text:"diagmgr-2-*" OR text:"diag_port_lb-2-*" OR text:"dmm-2-*" OR text:"dot1x-2-*" OR text:"dpp_debug-2-*" OR text:"dpvm-2-*" OR text:"dstats-2-*" OR text:"dt_helper-2-*" OR text:"eltmc-2-*" OR text:"elt
    m-2-*" OR text:"eou-2-*" OR text:"epld_auto-2-*" OR text:"epld_upgrade-2-*" OR text:"epp-2-*" OR text:"eth-port-sec-2-*" OR text:"ethport-2-*" OR text:"eth_port_channel-2-*" OR text:"eureka_usd-2-*" OR text:"evmc-2-*" OR text:"evmed-2-*" OR text:"evms-2-*" OR text:"example_test-2-*" OR text:"regex-2-*" OR text:"rip-2-*" OR text:"rpm-2-*" OR text:"slab_lib-2-*" OR text:"smm-2-*" OR text:"syslog-2-*" OR text:"syswrap_lib-2-*" OR text:"tcp-2-*" OR text:"tsp-2-*" OR text:"tx-2-*" OR text:"u6rib-2-*" OR text:"urib-2-*" OR text:"systemhealth-2-*" OR text:"ethport-2-*" OR text:"fex-2-*" OR text:"licmgr-2-*" OR text:"pfma-2-*" OR text:"lldp-2-*" OR text:"qos-2-*") AND product:"NX-SW" AND (FIELD_EXISTS(####################################) AND (text=~"%\S+-(?<######################>\S+):" AND product:"NX-SW"))) as item0 ORDER BY item0.timestamp DESC token=###########] ["DistributedQueryThreads-thread-10"/IP_address WARN] [com.vmware.loginsight.commons.rpc.clientconnpool.ClientConnectionPool] [Pooled client connections to hostname: 0.0.0.0, port: #####, service: com.vmware.loginsight.analytics.LogSearchWorker$Client closed with broken flag: 1]
     ["LogSearchWorker-thread-1"/IP_address ERROR] [com.vmware.loginsight.daemon.StrataServiceFailureHandler] [Fatal error:]
    java.lang.StackOverflowError: null
            at com.vmware.loginsight.piql.PIQLQueryHelper.extractIndex(PIQLQueryHelper.java:317) ~[analytics-lib.jar:?]
            at com.vmware.loginsight.piql.PIQLQueryHelper.extractIndex(PIQLQueryHelper.java:317) ~[analytics-lib.jar:?]
            at com.vmware.loginsight.piql.PIQLQueryHelper.extractIndex(PIQLQueryHelper.java:317) ~[analytics-lib.jar:?

  • Reviewing the queries before the crash, it is observed that there are 400+ queries. These queries are part of alert definition provided by Cisco Nexus content pack and the loginsight service is stable after disabling the alert definition.

    • Alert Definition:



    • Content Pack:

       

Environment

Aria Operations for Logs 8.18.x

Cause

An alert definition from Cisco Nexus content pack with an excessive number of predicates (over 400) exceeded the supported limit of Aria Operations for Logs, causing the loginsight service to crash.

Resolution

The Cisco content pack alert, which generated an unusually high volume of queries, overwhelmed the Log Insight service, leading to its crash. Aria Operations for Logs can support up to 200 predicates in a query however as a best practice, it is advised not to go beyond 100 predicates. 

To resolve this issue, 

  1. Edit the problematic alert definition and copy the queries. 
  2. Split the large number by creating multiple alert definitions. To create new alert, please refer Define an Alert .
  3. If you wish to simplify the queries provided by the content pack, it is advised to raise a ticket with Cisco support.