Data Aggregator fails synchronization

Products

CA Performance Management Network Observability

Issue/Introduction

Received messages in DX Netops Portal Dashboard: 'Source is currently unavailable (query ID....) '

Found failures on the Netops Portal for DA Data Source synchronization in Netops Portal data source logs.

Manually running test on the datasource passed, but re-sync on the DA did not succeed.

Stopped and restarted the DA daemon and ActiveMQ and then was able to do a full resync.

Why do we see Data Aggregator synchronization failures intermittently in the NetOps Portal?

When those synchronization failures occur, we also see these errors in the Device Manager DMService.log files.

ERROR | pool-4-thread-6          | 2021-09-12 16:35:41,307 | com.ca.im.portal.dm.productsync.DataSourcePoller|
| 
Data source Data Aggregator@host encountered an error while processing a sync request.  The problem may be in either the data source or CAPC.
The following stack trace may indicate where the problem lies.
You may also want to check the logs for the data source to determine if it is the cause of the problem.
java.lang.NullPointerException

Environment

All supported DX NetOps Performance Management releases

Cause

Unable to be determined. Synchronization failures stopped appearing.

Resolution

To capture more meaningful messages next time synchronization failures related to the NullPointerException error occur make the following changes.

This will not change the logging for DataSourcePoller messages in the normal DMService.log and wrapper-<date>.log files. Instead it will add a new DataSourcePoller.log file. The hope is that these changes will help capture the full stack trace we're missing in the current logs to identify the cause of the NullPointerException.

To make the change open the (default path) /opt/CA/PerformanceCenter/DM/resources/log4j.properties file for editing. Add this section to the end of the file and save the changes.

#
# Capture full stack trace for DA sync failures generating NullPointerException errors in DMService.log.
# Generates new DataSourcePoller.log file in the (default path) /opt/CA/PerformanceCenter/DM/log directory.
# 
log4j.appender.ExceptionLog=org.apache.log4j.RollingFileAppender
log4j.appender.ExceptionLog.layout=org.apache.log4j.PatternLayout
log4j.appender.ExceptionLog.layout.ConversionPattern=%d | %p | %c | (%F:%L) %n    | %m %n
log4j.appender.ExceptionLog.file=/opt/CA/PerformanceCenter/DM/logs/DataSourcePoller.log
log4j.appender.ExceptionLog.MaxFileSize=100MB
log4j.appender.ExceptionLog.MaxBackupIndex=1
log4j.logger.com.ca.im.portal.dm.productsync.DataSourcePoller=DEBUG, ExceptionLog, stdout, application-timesize

After making the change to the log4j.properties file we need to restart the DM and PC console services.

For RH 7.x use these commands:

systemctl stop caperfcenter_console
systemctl stop caperfcenter_devicemanager
systemctl start caperfcenter_devicemanager
Wait 30 seconds...
systemctl start caperfcenter_console

For RH 6.x use these commands:

service caperfcenter_console stop
service caperfcenter_devicemanager stop
service caperfcenter_devicemanager start
Wait 30 seconds...
service caperfcenter_console start

When we see new instances of the errors, gather a diagnostics logging package from the Netops Portal host using the re.sh script. Attach the resulting logging package to a new support case requesting assistance finding a cause and solution for the failures.

Instructions to run the re.sh script are found in the Unable to Resolve Issue documentation.