Primary SpectroSERVER hanging with high cpu

book

Article ID: 8197

calendar_today

Updated On:

Products

CA Spectrum

Issue/Introduction

The Primary SpectroSERVER is hanging and showing high cpu. Tomcat disconnects and will not fail over to the Secondary SpectroSERVER

Cause

If the Primary SpectroSERVER is busy processing trap related events, it could cause high cpu to the point where the SpectroSERVER stops responding to client requests which could then result in OneClick disconnecting and preventing OneClick from failing over to the Secondary SpectroSERVER.

Environment

Release: a7n0c000000PBNr
Component:

Resolution

Check the Archive Manager DDM database for models that are logging a high number of events by doing the following:

1. Log into the SpectroSERVER system as the user that owns the Spectrum installation

2. If on Windows, start a bash shell by running "bash -login"

3. cd to the $SPECROOT/mysql/bin directory and enter the following command to print out the top 50 models logging the top 50 events where the start and end dates correspond to a 24 hour period in your environment where you are seeing the issue occur:

./mysql --defaults-file=../my-spectrum.cnf -uroot -proot ddmdb -e "select hex( type ), hex( e.model_h ), m.model_name, count( * ) as cnt from event e, model m where e.model_h = m.model_h and utime > UNIX_TIMESTAMP('2017-09-07 00:00:00') and utime < UNIX_TIMESTAMP('2017-09-08 00:00:00') group by type, e.model_h order by cnt desc limit 50"

The output will look similar to the following where the first column is the event id, the second column is the model handle, the third column is the model name and the fourth column is the number of events logged in the time period specified:

+-------------+------------------+--------------------------------------------------------+------+

| hex( type ) | hex( e.model_h ) | model_name                                             | cnt  |

+-------------+------------------+--------------------------------------------------------+------+

| 10F91       | 200000E          | SSPerformance                                          | 1440 |

| 4820002     | 200000E          | SSPerformance                                          | 1440 |

| 1022F       | 200000E          | SSPerformance                                          | 1440 |

| 1001D       | 2000278          | Sim30123:nslabcn501a.geico.net                         |   74 |

| 1001D       | 200006E          | Sim30123:nslabcn501a.geico.net                         |   29 |

| 10219       | 200006B          | Andy                                                   |   18 |

| 1021A       | 200006B          | Andy                                                   |   18 |

| 1001D       | 2000660          | Sim30017:vanor-nor-idf-1-01.mgmt.internal.das          |   12 |

 

4. You can then launch the Event Configuration editor, filter for the event id and display the "Trap Event" column in the Navigation panel to see if the top event id's are trap events:

<Please see attached file for image>

778023_1.png

 If the high number of events are caused by traps from a few devices, determine why these devices are sending this volume of traps and address the issue at the device(s).

Attachments

1558693069443000008197_sktwi1f5rjvs16h6r.png get_app