This guide will suggest best practices when it comes to the overall event stream in NetOps Spectrum as it relates to Archive Manager, Report Manager and the SpectroSERVER.
Release: Any
The event stream can vary from infrastructure to infrastructure. The following are suggestions to allow for best performance and overall health of the NetOps Spectrum databases.
Archive Manager
By default the Archive Manager is set to store 45 days worth of data. Unless contractually bound, this value should not be adjusted higher. There is no size limit on the Archive Manager.
Each event is stored in the DDMDB (Archive Manger database) as a row or record. Each row added to the underlining database will effectively increase the size of the database.
The larger this database becomes the more difficult it becomes to get data from it in a quick manner, and if it becomes corrupt it becomes more difficult to administer it and repair it.
Reducing the amount stored in the DDMDB will minimize the overall size, and thus increase overall performance when it comes to OneClick events tab and Spectrum Report Manager's ability to gather events more quickly.
The Archive Manager runtime configuration file, .configrc, is located in the $SPECROOT/SS/DDM directory. MAX_EVENT_DAYS defaults to 45. This should remain at 45, or be lowered to a different number like 30. If you have Report Manager, 45 is a good level to remain at. If you do not use Report Manager, lowering it is suggested.
The overall size of the DDMDB can be a problem if it is too large. If the database becomes corrupt, it will need a repair. Repairing the database quickly depends on size, and resources available on the system. The larger the database, the longer the repair time. Another problem with it being too large is events tab in OneClick can become slow. Some enhancements has been made to try and work around this but with anything large it will still take some time to populate the result set.
It is also suggested to run the DDMDB maintenance and optimize scripts monthly. These are located in the $SPECROOT/SS/DDM/scripts directory. Simply execute these via the command line.
How can you determine if your db is too large? There are a couple of ways and please keep in mind these are average values.
Locally Stored Events
When the Archive Manager is down, or not available due to corruption of the DDMDB, the SpectroSERVER will store events locally. The default value is 20,000 events. As previously mentioned, this amount would be used up in a short period of time. As events continue in over the 20,000 value, the older events will be purged and lost forever.
The value for locally stored events can be adjusted and Technical Support suggests setting this value to about 2 million (please note this does increase local file storage of the SSevents.db file in the SS directory, so make sure that you have adequate disk space available). On average, this will "allow" for a couple of days of the Archive Manager being down if you have a large event stream. During that time you should work to resolve the reason that the events are storing to the SSdb and not the DDMdb.
This setting is changed in the .vnmrc file located in the $SPECROOT/SS directory. Look for the value MAX_EVENT_RECORDS=20000, and set this to be MAX_EVENT_RECORDS=2000000
The SpectroSERVER will need to be restarted to have this value take effect, so it should be scheduled. It should also be adjusted on the secondary SpectroSERVERs.
Reducing event stream
The only way to reduce the overall event stream is to limit what is being stored in the DDMDB. You can do some easy MySQL queries to gather some information to give you an idea of the biggest offenders.
To logon to mysql:
./mysql --defaults-file=../my-spectrum.cnf -uroot -p<passwd> ddmdb -A
The following should be pasted into the mysql> prompt, each query starts with a SELECT and ends with a semi-colon (;):
# To get a count of the # of events that occured after a set date:
SELECT count(*) FROM event WHERE utime >= UNIX_TIMESTAMP("2008-01-01");
# To get the Top 10 events most commonly generated:
SELECT hex(type), COUNT(*) as cnt
FROM event GROUP BY type
ORDER BY cnt DESC LIMIT 10;
# To get the Top 10 models with the most events:
SELECT hex(e.model_h), m.model_name, COUNT(*) as cnt
FROM event e, model m WHERE e.model_h=m.model_h
GROUP BY e.model_h
ORDER BY cnt DESC LIMIT 10;
# To get the Top 10 high-volume days for events:
SELECT date(from_unixtime(utime)) as x, count(*) as cnt
FROM event GROUP BY x
ORDER BY cnt DESC LIMIT 10;
# To get the last 10 days volume of events:
SELECT date(from_unixtime(utime)) as x, count(*) as cnt
FROM event GROUP BY x
ORDER BY x DESC LIMIT 10;
Once the largest offender events are identified, a decision can be made about whether or not those are actually needed. For example, AUTHENTICATION FAILURE events are commonly not needed to be seen in most environments and are mere informative events. The alarms will continue to be generated but you can prevent the actual event from being stored in the DDMDB.
To do that:
This will update the event configuration in NetOps Spectrum and from that point forward that event will no longer be stored in the DDMDB. This means the event will not be searchable via the events tab.
Report Manager
NetOps Spectrum Report Manager is what should be used for historical event reports for historical purposes. It also provides a GUI that presents the data in a more professional manner.
It has a better ability to store large amounts of data. Its architecture is designed to gather events from the DDMDB on a polled basis. This happens quickly without any customer interaction. It will gather roughly 10,000 events each poll and process those into its own database (reporting).
Storing lots of event data in the DDMDB is counter productive if you own Report Manager.
Technical Support suggests utilizing Report Manager as your historical event database. Leaving 45 days in your Archive Manager database means if something happens to the Report Manager database, you can always reprocess the DDMDB data.
How that is done:
This will remove all bucket tables and truncate the tables. Then it sets the event_sync_time back 45 days from, now(). Which is the time of execution.
Please reference the following documentation sections for more information:
Database Management
Reporting Database Management
If you have any questions regarding best practices as it relates to the events, alarms, or anything else within NetOps Spectrum it is suggested to reach out to the Broadcom DX NetOps User Community board. If you do not get what you need from the community, you can contact Broadcom Technical Support.