All Data Collector status values show "Not Connected"
search cancel

All Data Collector status values show "Not Connected"

book

Article ID: 241492

calendar_today

Updated On:

Products

CA Performance Management Network Observability

Issue/Introduction

Would like to troubleshoot this to understand why some collectors go into this state after a bit. So far my resolution would be to bounce activemq. But doing this flushes any queued data. This is not desirable. How do we troubleshoot this issue to better understand "what" is wrong?

In Administration -> Monitored Items Management -> Data Collectors we see all three status values (Configuration Status, Polling Status and Status columns) with red "Not Connected" values.

In the System Status Data Collector section we see all three status values (Configuration Status, Polling Status and Status columns) with red "Not Connected" values.

In the current Data Aggregator (default path shown) /opt/IMDataAggregator/broker/apache-activemq-<version>/data/activemq.log file we see these WARN messages. That will be one for each Data Collector impacted.

2024-11-08 15:54:54,014 | WARN  | Usage(default:memory:queue://DIP-poll.responses.irep-<DCM_ID>adc:memory) percentUsage=100%, usage=209739881, limit=209715200, percentUsageMinDelta=1%;Parent:Usage(default:memory) percentUsage=1%, usage=655311976, limit=37768868659, percentUsageMinDelta=1%: Usage Manager Memory Limit reached. Producer (ID:<DCM_ID>-3:16:1:1) stopped to prevent flooding queue://DIP-poll.responses.irep-<DCM_ID>adc. See http://activemq.apache.org/producer-flow-control.html for more info (blocking for: 24410s) | org.apache.activemq.broker.region.Queue | ActiveMQ Transport: tcp:///127.0.0.1:34148@61616

Environment

All supported DX NetOps Performance Management releases

Cause

Network connection problems between impacted Data Aggregator and Data Collector servers.

We have observed disconnects ranging from 1 to 3 minutes in duration can result in this behavior.

Resolution

In some instances it may be sufficient to simply stop and restart the AMQ service to resolve the issue.

  1. Confirm dcmd service is running with "systemctl status dcmd"
  2. Run "systemctl stop activemq"
  3. Run "systemctl status activemq"
  4. If it does not get restarted by the running dcmd service manually restart it with "systemctl start activemq".

If the issue is not resolved a clean dcmd restart will help. On the problem Data Collector(s) follow these steps:

  1. Stop the ActiveMQ service:
    • systemctl stop activemq
  2. Stop the dcmd service:
    • systemctl stop dcmd
  3. Navigate to the Scripts Directory (default path shown):
    • cd /opt/IMDataCollector/scripts
  4. Run this command to 'clean' the Data Collector:
    • ./dcmd clean
  5. Start the dcmd Service:
    • systemctl start dcmd
  6. Confirm both it, and the activemq service, were restarted and remain running:
    • systemctl status dcmd
    • systemctl status activemq

Additional Information

The "./dcmd clean" helps to clear out old data and configuration files. Upon dcmd restart it forces creation of a fresh data directory and ensures it the correct ownership settings for the install user.

The command ./dcmd clean performs several actions to accomplish this. This summarizes the steps the "./dcmd clean" takes.

  • Move Data to Backup:
    • The command moves existing data to a backup location named data.bak.
  • Remove Configuration File:
    • It removes the local-jms-broker.xml configuration file.
  • Create Data Directory:
    • The command creates a new data directory.
  • Change Ownership:
    • It changes the ownership of the new data directory to the install user.