ETL Batch Process job failure Events in Performance Management
search cancel

ETL Batch Process job failure Events in Performance Management

book

Article ID: 186483

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

There are Administrative Events in Performance Management that state:

"Batch process job DimltemsETLjob failed"I

In a report for the last 30 days for Events, we see many instances of the following failure.

Device: DataAggregator:<IP-or-Name>
IM Data Aggregator Administration Event
Reporting ETL Service
Batch process job DimItemsETLJob failed.

Seen with release r3.7.3; can be seen in others newer or older.

Matching the Event to Data Aggregator logging, at the same time in the (default path) /opt/IMDataAggregator/apache-karaf-<version>/data/log/Exception.log we see the following error.

2019-11-26 15:15:19,6272019-11-26 15:15:19,627 | ERROR | ExceptionLog | An existing application exception RECURRED (Key=f53e96e098b360b7e3ea614aa28cedd577f99565), Recurrence count=23 : Exception encountered while performing dimension items batch job. : StatementCallback; SQL <Long vsql query>
[Vertica][VJDBC](4840) ERROR: Subquery used as an expression returned more than one row; nested exception is java.sql.SQLIntegrityConstraintViolationException:
[Vertica][VJDBC](4840) ERROR: Subquery used as an expression returned more than one row | (ExceptionLogger.java:104) 

We can see the same exception error appear every hour in line with the appearance of the failed job Events being observed.

Environment

Release : 3.7

Component : IM Reporting / Admin / Configuration

Cause

The cause for the ETL failure is it's inability to update dim_item/type tables in the Data Repository database. This means nothing new or updated or deleted will be touched.

Resolution

To resolve this we need to identify which sub-query is causing the error from the larger vsql query seen in the above referenced Exception.log.

The end cause and solution will vary. If this issue is observed please open a new support case for guidance.

When opening the support case, please attach the following to help accelerate review and resolution.

1. Data Aggregator CARE package via re.sh command. Instructions can be found here.

https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/it-operations-management/performance-management/3-7/troubleshooting/unable-to-resolve-issue.html

2. Open an Events View report in Performance Management where these Events are present. Filter it for these Events. Use the Views Gear icon and select the "Export to CSV (scaled)" to create an export of these Events for review.