Spectrum Data Source is frequently failing or very slow for sync
There are timeout errors in the Performance Management Device Manager DMService.log files.
An example of the error from the Device Manager service wrapper.log file is the following.
INFO | jvm 1 | 2020/09/07 00:03:25 | ERROR | Thread-17799 | 2020-09-07 00:03:25,502 | com.ca.im.portal.dm.inventory.Inventory2WSImpl
INFO | jvm 1 | 2020/09/07 00:03:25 | | gatherDataThread (InvWS_26e13a604db4436a8c997fefe068354b)
INFO | jvm 1 | 2020/09/07 00:03:25 | com.ca.im.portal.dm.inventory.InventoryTimeoutException
INFO | jvm 1 | 2020/09/07 00:03:25 | at com.ca.im.portal.dm.inventory.Inventory2WSImpl.gatherData(Inventory2WSImpl.java:835)
INFO | jvm 1 | 2020/09/07 00:03:25 | at com.ca.im.portal.dm.inventory.Inventory2WSImpl.run(Inventory2WSImpl.java:812)
INFO | jvm 1 | 2020/09/07 00:03:25 | at java.lang.Thread.run(Thread.java:748)
All Supported releases
Component : Spectrum Integrations
This is most often seen during a Full Synchronization of the Spectrum Data Source in Performance Management. These are only run when requested by a user via the UI, or when the Spectrum OC tomcat and/or SpectroSERVER(s) are restarted.
NOTE: User requested Full Synchronizations should only be launched when requested via support or engineering.
The default timeout is 20 minutes. Spectrum takes longer than that to send a response to the Device Manager service requesting it. When the 20 minute timeout expires, Device Manager closes the socket the request went through and prints the timeout error in the DMService.log.
In most cases it's caused by a group membership sync request from Performance Management to Spectrum.
Upgrade to the latest r20.2 suite of NetOps releases. The Spectrum release comes with a very large efficiency improvement around how it generates and builds the response to the sync requests and resolves this problem.
Temporary solution is increasing the DeviceManager.Timeout from 20 minutes to 4 hours. This allows the response sufficient time to be built and sent back.
Note we've seen instances where 1 hour timeout is sufficient, and others where 16 hours was required.
To modify the timeout take these steps.
For example to set it to 4 hours we'd run:
Want to set it to a different time frame? Here are some examples:
Once the value is changed, the next synchronization cycle launched will be the first to use the new setting. No service restarts are needed.
To view the setting, look for DeviceManager in the following query:
select * from netqosportal.general;
The DeviceManager.Timeout can be left in place. It will persist through upgrade.
Downsides? It's used by all Data Sources. If another one goes into sync failure due to a real error during a Full Sync cycle, it won't be known for the time frame set for the timeout.