Data Collector log files filling disk may lead to outage and data loss in Performance Management

book

Article ID: 196768

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

A development Data Collector was found down. Upon investigation the disk was full. The Exception.log, Expression.log and karaf.out log files were each greater than 30 GB in size.
 
After deleting the log files and restarting the dcmd service on the Data Collector, the logs began growing exponentially again.

Cause

  1. An error in the Expression used for Packets, Packets In and Packets Out metrics from the SDN SilverPeak Virtual Interface (Internal Name: SDNSilverPeakVirtualInterface) Vendor Certification that comes with the VNA SilverPeak plugin generated the error printed to the logs.
  2. Improper handling for the error triggered by the bad expressions was causing the exponential log growth leading to potential outages.

Environment

VNA SilverPeak plugin r20.2.1

Resolution

Defect DE473263 has been submitted to engineering. It will be used to fix both:

  • Broken expressions in the problem Certification.
  • The improper error handling leading to log growth.

The defect is included in the r20.2.2 DX NetOps Performance Management and Virtual Network Assurance releases. To resolve it upgrade or follow these steps.

To resolve the log growth the following changes should be made to the Certification at issue.

  1. Open a REST API client.
  2. Issue a GET to the following URL. Replace <DA_Host> with the Data Aggregator host name.
    • http://<DA_Host>:8581/typecatalog/certifications/sdn/SDNSilverPeakVirtualInterface
  3. Change the version from 1.1 to 1.1001.
    • This is important. When the fixed updated OOTB VC is installed in a new release, the version will likely be 1.2 or greater. Without this change which keeps the version below the one that'll arrive with a new OOTB certification, it won't get replaced with the new factory version.
  4. Replace the Expression entries for the Packet Discards, Packet Discards In and Packet Discards Out metrics with these new ones listed below. The changes are:
    • The use of .doubleValue()
    • Changing 100 to 100.0.
  5. Put the new Certification code into the body for the same URL and change the REST client to issue PUT statement.
  6. Confirm a 200 response is returned to indicate successful REST client.

<Expression destAttr="PctDiscards">snmpProtectedDiv(((isdef networkIncomingDroppedPackets)?networkIncomingDroppedPackets.doubleValue():0.0)+((isdef networkOutgoingDroppedPackets)?networkOutgoingDroppedPackets.doubleValue():0.0),
((isdef networkIncomingPackets)?networkIncomingPackets.doubleValue():0.0)+((isdef networkOutgoingPackets)?networkOutgoingPackets.doubleValue():0.0))*100.0</Expression>

<Expression destAttr="PctDiscardsIn">snmpProtectedDiv(((isdef networkIncomingDroppedPackets)?networkIncomingDroppedPackets.doubleValue():0.0),
((isdef networkIncomingPackets)?networkIncomingPackets.doubleValue():0.0)+((isdef networkOutgoingPackets)?networkOutgoingPackets.doubleValue():0.0))*100.0</Expression>

<Expression destAttr="PctDiscardsOut">snmpProtectedDiv(((isdef networkOutgoingDroppedPackets)?networkOutgoingDroppedPackets.doubleValue():0.0),
((isdef networkIncomingPackets)?networkIncomingPackets.doubleValue():0.0)+((isdef networkOutgoingPackets)?networkOutgoingPackets.doubleValue():0.0))*100.0</Expression>