DX NetOps Data Aggregator not populating data due to disk space exhaustion
search cancel

DX NetOps Data Aggregator not populating data due to disk space exhaustion

book

Article ID: 437596

calendar_today

Updated On:

Products

Network Observability CA Performance Management

Issue/Introduction

The Data Aggregator (DA) stops processing data from Data Collectors (DCs) when the IMDataAggregator disk partition reaches 100% capacity․

The DA shows a disk full issue due to the $DA_HOME/IMDataAggregator/data/performance-spool directory. It has filled the disk with *.dto files.​

In the the $DA_HOME/IMDataAggregator/data/logs/netops-data-aggregator.log we see this error when the problem begins.

2026-04-12T00:31:09,232 | ERROR | rTask-thread-337 | ExceptionLog                     | .ca.im.core.util.ExceptionLogger  108 |  -  -  |  | A NEW application exception occurred (Key=f3fae1b727a13...32b3d321cd9b810d9a29ea) : ROLLUP: insertEqdGtdRollups Error during temp table process, dcmId= , dcmIds=[11808, 11810], MF=ifstats, uuid=597a...-a1b3-19acbc8c3e8b : StatementCallback; SQL [<<<vsql query>>>
...
[Vertica][VJDBC](2927) ERROR: Could not write to [/data01/drdata/v_drdata_node0003_data]: Volume [/data01/drdata/v_drdata_node0003_data] has insufficient space.
...
Caused by: java.sql.SQLTransientException: [Vertica][VJDBC](2927) ERROR: Could not write to [/data01/drdata/v_drdata_node0003_data]: Volume [/data01/drdata/v_drdata_node0003_data] has insufficient space.
...
Caused by: com.vertica.support.exceptions.TransientException: [Vertica][VJDBC](2927) ERROR: Could not write to [/data01/drdata/v_drdata_node0003_data]: Volume [/data01/drdata/v_drdata_node0003_data] has insufficient space.

SYMPTOMS:

  • Data Aggregator (DA) is not populating new data for Portal reports.

  • The /opt/netops/IMDataAggregator partition is completely full (100% usage).

  • DA is failing to process data sent by Data Collectors.

CONTEXT: This occurs when the DA data directory consumes the majority of the allocated disk space, preventing further data processing.

IMPACT: Loss of visibility into network performance metrics and potential data loss if the disk is not cleared.

Environment

Network Observability DX NetOps Performance Management

Cause

Node0003 in the three node Vertica DR DB cluster filled it's data directory disk space. Where node0001 and node0002 were configured with larger data directories than catalog, node0003 had them transposed. The catalog directory had the disk space data needed, while data had the space catalog needed.

As a result the data directory filled up prematurely.

Once the node was no longer working due to the full disk the DB stopped processing polled data sent from the DA.

As a result the DA data files backed up and filled the available disk space.

Resolution

To resolve this we'll stop the running node, copy data between locations and remount them using the exact same location names used initially.

To resolve this follow these steps.

  1. Stop the DB on the problem node.
    1. Use the adminTools UI to stop Vertica on the problem node.
    2. Launch /opt/vertica/bin/adminTools as the dradmin or equivalent user.
    3. Choose option 7 the Advanced Menu.
    4. Choose option 2 the "Stop Vertica on Host" option.
    5. Stop Vertica on the problem host.
  2. With Vertica shut down take a snapshot or file system backup. For safekeeping.
    • Strongly recommend creating a snapshot of the system after stopping Vertica on the node.
    • We can use it to recover to the current state and try again if something doesn't work.
  3. Copy data between locations.
    • We're moving all files/data in the current catalog directory, to the current data directory.
    • We're moving all files/data in the current data directory, to the current catalog directory.
    • It's critical to ensure permissions and ownership for the files remains unchanged.
    • When copying files and data between locations we recommend using something like "cp -p -R" for the copy command.
      • The "-p" retains permissions and ownership for copied files.
      • The "-R" is a recursive flag to ensure everything in the directory structure gets copied.
  4. After validating the data is transferred between directories, we need to reset the mount points.
    • We must flip them while ensuring the SAME EXACT PATH NAMES are reused.
    • This is required. When the DB restarts it must see the same catalog and data locations before it will use them.

Once completed the restart the DB node using adminTools.

If necessary restart the dadaemon services to have it start processing the backed up data in its *.dto files.