Spectrum showing performance issues and missing some Alarms
search cancel

Spectrum showing performance issues and missing some Alarms

book

Article ID: 218357

calendar_today

Updated On:

Products

CA Spectrum DX NetOps

Issue/Introduction

Spectrum is showing periodic performance issues. It is often slow to respond or complete tasks, and often we're missing Alarms that should have been raised.

Some of the symptoms that are seen when the issue is present may be:

  • "CONTACT LOST TO SECONDARY SPECTROSERVER" alarms that set and clear at nearly the same time
  • Model activation taking longer than normal
  • AlarmNotifier abnormally disconnecting as seen in the $SPEROOT/SS/VNM.OUT file.
    • Example message that might be seen:
    • Jun 24 02:37:50 ERROR TRACE at CsConnect.cc(524): Abnormally disconnected client - application "ServiceNOW_Notifier" sub app. "P:A SPECTRUM Alarm Notification Manager (SANM) application:CA" version "23.3.10.08"
  • Mutex timeout messages as seen in the $SpeCROOT/SS/VNM.OUT file.
    • Example message that might be seen:
    • Jun 23 22:52:09 WARNING at CsHPSERequestSender.cc(1707): timeout mutex lock took 1195ms!
  • Finding dmp or core files created in the $SPECROOT/SS/support directory.
  • Expected Alarms from models are missing.
  • Models show colored border as if an Alarm is raised but no active Alarm is present on the Alarms tab for the model.

Environment

All supported DX NetOps Spectrum releases

Cause

For one common instance where a dmp or core file is created analysis shows the following in the stack trace.

#24 0x00007f4a900d58f8 in GlobalCollectionIH::run_search(CsModelHandle const&, GlobalCollectionIH::SearchInitiator_e) () from /opt/SPECTRUM/lib/../SS/libmdlsvint.so.1

This tells us that a Global Collection (GC) search is causing the SpectroSERVER performance issue.

Resolution

Common known causes for a GC causing SpectroSERVER performance issues:

  • Large number of GC's overwhelming existing resources.
  • GC's with Search schedules set to use the Real-Time Update method
  • Large number of GC's running scheduled searches at approximately the same time
  • GC's with large complex search criteria
  • GC's using external attributes 

Common recommendations to resolve these problems.

  • Ensure system resources are allocated to meet current demand.
  • Lower the number of GC's configured leaving only required GC's.
  • Change the Search schedule for any GC using Real-Time Update.
  • Do not use attributes that make external SNMP calls to gather current values.
  • Use as few AND/OR entries in the rules configured as possible. The more efficient the search criteria, the better the performance.
  • Configure searches to run only as often as needed
  • Use the "Schedule Update" option instead of the "Run search to update Global Collection membership every xx hour(s)" option.

Which GC's are using Real-Time Update? 

  • You can find all GC's set to use Real-Time Update with the Locater Search -> Create a New Search configuration shown here.
  • The EnableInstantUpdate attribute (Attribute ID 0x12e26) will be part of the GlobalCollection ModelType selection options.

Additional Information

NOTE: One thing that is not well known is when you create a GC and set to run the search every xx hours, say 24 hours for example, the search is run when you click the OK button and the clock starts ticking at that time to run every 24 hours from the time you click the OK button.

So if you create or update multiple GC's around the same time, you can get multiple GC updates at approximately the same time.

To avoid this you can schedule GC's to run at specific times.