SpectroSERVER process taking 100% cpu and tomcat running out of memory due to too many alarms
search cancel

SpectroSERVER process taking 100% cpu and tomcat running out of memory due to too many alarms

book

Article ID: 138174

calendar_today

Updated On:

Products

Spectrum Network Observability

Issue/Introduction

The SpectroSERVER may consume 100% cpu and users may not be able to log in to Oneclick. Other issues such as the landscape showing switched or OneClick windows will take a long time to populate.  Searches may never complete.

 

If performance stack dumps are gathered, the stack output will show alarm filtering as seen below.  OneClick Tomcat thread dumps will show POST activities with getAlarmsByXML:

 

Linux SpectroSERVER pstack output:

#1 0x00007f17d35af1b9 in CsGenAttrsIter::get_next_attr() () from /opt/SPECTRUM/lib/libVPapi.so.1

#2 0x00007f17d774a76a in CsGlobalAlarmClient::copy_desired_attrs(CsGlobalAlarmAttrs const&, CsAttrReadReqVPList const&, CsGlobalAlarmAttrs*) () from /opt/SPECTRUM/lib/../SS/libgas.so.1

#3 0x00007f17d7747030 in CsGlobalAlarmClientHandler::get_alarms(CsCAttribute::CsCValue::_VISanon_seq_0_CsCAttribute__CsCValue const*, CsAttrReadReqVPList*, CsSecurityIf const*) () from /opt/SPECTRUM/lib/../SS/libgas.so.1

#4 0x00007f17d7742b8a in CsAlarmDomainSrvc::getAlarmListWithAttrsNoFiltering(CsCAttribute::CsCValue::_VISanon_seq_0_CsCAttribute__CsCValue const&, CsAttrReadReqVPList const*, CsSecurityIf const&)

 

Windows SS stack output gathered using Windows Process Explorer:

00007ffa`2f6c49ff : 0000007f`00000008 0000007f`85b7fa90 00007ffa`1cc26d88 0000007f`00011f4e : libgas!CsGlobalAlarmFilterParser::comparison_operation+0x29e [d:\spectrum\10.02.01\cm\windows\10.02.01.00.98\gas.a\managers\src\csgalarmfp.cc @ 507]

00007ffa`2f6c4c00 : 0000007f`85b7fa90 00000080`3453bf50 00007ffa`1cc26d88 00000080`3453bed8 : libssorbutil!CsAttrFilterParser::operation+0x9f [d:\spectrum\10.02.01\cm\windows\10.02.01.00.98\ssorb.a\util\src\csattrfltp.cc @ 962]

00007ffa`1cbfa08d : 0000007f`85b7fa90 00007ffa`1cc26d88 0000007f`a5a76101 0000007f`85b7fa90 : libssorbutil!CsAttrFilterParser::parse+0x4c [d:\spectrum\10.02.01\cm\windows\10.02.01.00.98\ssorb.a\util\src\csattrfltp.cc @ 676]

00007ffa`1cbf6ef4 : 0000007f`85b7fa90 0000007f`85b7fa00 00000000`00000000 00007ffa`00000001 : libgas!CsGlobalAlarmFilterParser::test+0x39 [d:\spectrum\10.02.01\cm\windows\10.02.01.00.98\gas.a\managers\src\csgalarmfp.cc @ 333]

00007ffa`1cbf1448 : 00000080`1526af50 00000080`010c2a7e 00000080`3453bf50 0000007f`85b7fbb0 : libgas!CsGlobalAlarmClientHandler::get_alarms+0x1d0 [d:\spectrum\10.02.01\cm\windows\10.02.01.00.98\gas.a\managers\src\csgaclienth.cc @ 562]

00007ffa`1cbec538 : 00000080`1526fc40 0000007f`85b7fbf0 00000000`00000000 00000080`022cfc48 : libgas!CsAlarmDomainSrvc::getAlarmListByAttrFilter+0xa8 [d:\spectrum\10.02.01\cm\windows\10.02.01.00.98\gas.a\corba\src\csalrmdsrvc.cc @ 541]

 

From the catalina.out thread stack:

at com.ca.spectrum.restful.servlet.AlarmServlet._POST_getAlarmsByXml(AlarmServlet.java:566)

If you review the LocalHostAccessLog files for OneClick and search for "REST" you will see <user> running many REST alarm queries:

 

 POST /spectrum/restful/alarms HTTP/1.1 200 3110 67718
 POST /spectrum/restful/alarms HTTP/1.1 200 13464 65167
 POST /spectrum/restful/alarms HTTP/1.1 200 114 91603
 POST /spectrum/restful/alarms HTTP/1.1 200 114 88756
 POST /spectrum/restful/alarms HTTP/1.1 200 114 99280
 POST /spectrum/restful/alarms HTTP/1.1 200 114 80193
POST /spectrum/restful/alarms HTTP/1.1 200 28369 89942
 POST /spectrum/restful/alarms HTTP/1.1 200 20194 98286
 POST /spectrum/restful/alarms HTTP/1.1 200 3105 44722
 POST /spectrum/restful/alarms HTTP/1.1 200 6084 63906


Notice the very long response times at the end of the entries.

Cause

There are too many alarms or there are too many rest queries requesting to much alarm data.

Resolution

Review the OneClick gui for total number of alarms.  There are many factors affecting how many alarms each SpectroSERVER can handle however a typically running SpectroSERVER may start to exhibit issues at or near 100,000 alarms.  If you notice the total alarm count for that SpectroSERVER with a high alarm count such as 50000 or more, you will need to review and reduce the alarms.  You can do this by clearing alarms and/or seeing if there are a set of devices generating more alarms than others.

If you do not have a high alarm count, review your API integrations.  We have found external applications running that request too much alarm data and/or request too much too often.  You will need to review the API configuration and limit the alarm attributes being queried to only pull the ones truly needed and you may need to utilize Alarm Subscriptions instead of using constant queries.

 https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/spectrum/24-3/programming/web-services-api-reference/how-to-use-the-ca-spectrum-web-services-api/restful-resources-nouns/subscription.html#concept.dita_dc917a7753561dae07893f6a680d13130277b9ea_POSTSubscription