Fault Tolerant Data Aggregator fails to start

book

Article ID: 206092

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

A disk failure caused the need to re-install the DX NetOps Performance Management Fault Tolerant Data Aggregators.

After re-install one Fault Tolerant (FT) Data Aggregator (DA) works fine going from Active to Inactive without problems starting.

One FT DA does not start. It can be set to Maintenance. It will go Inactive if the other FT DA is in an Active state.

Trying to make the problem FT DA the Active DA results in failure to start.

The systemctl status for the activemq service shows the following failures.

[[email protected]_NAME scripts]# systemctl status activemq
activemq.service - Apache ActiveMQ
   Loaded: loaded (/etc/systemd/system/activemq.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2021-01-06 19:10:37 GMT; 9min ago
  Process: 3744 ExecStart=/opt/IMDataAggregator/scripts/activemq start sysd (code=exited, status=1/FAILURE)
 
Jan 06 19:10:37 HOST_NAME activemq[3744]: [155B blob data]
Jan 06 19:10:37 HOST_NAME activemq[3744]: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Jan 06 19:10:37 HOST_NAME activemq[3744]: Dload  Upload   Total   Spent    Left  Speed
Jan 06 19:10:37 HOST_NAME activemq[3744]: [155B blob data]
Jan 06 19:10:37 HOST_NAME activemq[3744]: cat: Binary: No such file or directory
Jan 06 19:10:37 HOST_NAME activemq[3744]: cat: file: No such file or directory
Jan 06 19:10:37 HOST_NAME systemd[1]: activemq.service: control process exited, code=exited status=1
Jan 06 19:10:37 HOST_NAME systemd[1]: Failed to start Apache ActiveMQ.
Jan 06 19:10:37 HOST_NAME systemd[1]: Unit activemq.service entered failed state.
Jan 06 19:10:37 HOST_NAME systemd[1]: activemq.service failed.

The command "journalctl -u activemq" shows more details including the file at issue.

[[email protected]_NAME scripts]# journalctl -u activemq
...
Jan 06 19:08:04 HOST_NAME systemd[1]: Starting Apache ActiveMQ...
Jan 06 19:08:04 HOST_NAME activemq[3188]: Starting ActiveMQ
Jan 06 19:08:04 HOST_NAME activemq[3188]: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Jan 06 19:08:04 HOST_NAME activemq[3188]: Dload  Upload   Total   Spent    Left  Speed
Jan 06 19:08:04 HOST_NAME activemq[3188]: [155B blob data]
Jan 06 19:08:04 HOST_NAME activemq[3188]: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Jan 06 19:08:04 HOST_NAME activemq[3188]: Dload  Upload   Total   Spent    Left  Speed
Jan 06 19:08:04 HOST_NAME activemq[3188]: [155B blob data]
Jan 06 19:08:04 HOST_NAME activemq[3188]: cat: Binary: No such file or directory
Jan 06 19:08:04 HOST_NAME activemq[3188]: cat: file: No such file or directory
Jan 06 19:08:04 HOST_NAME activemq[3188]: cat: (standard: No such file or directory
Jan 06 19:08:04 HOST_NAME activemq[3188]: cat: input): No such file or directory
Jan 06 19:08:04 HOST_NAME activemq[3188]: cat: matches/data/failover/daservice.uuid: No such file or directory
Jan 06 19:08:04 HOST_NAME systemd[1]: activemq.service: control process exited, code=exited status=1
Jan 06 19:08:04 HOST_NAME systemd[1]: Failed to start Apache ActiveMQ.
Jan 06 19:08:04 HOST_NAME systemd[1]: Unit activemq.service entered failed state.
Jan 06 19:08:04 HOST_NAME systemd[1]: activemq.service failed.

Per the activemq script in /opt/IMDataAggregator/scripts/ it's looking to the /etc/consul-ext.cfg file.

At first glance that file might appear issue free with correct permissions, ownership and content when read with cat or more commands. Running the following exposes the real issue, a text file that reports as a binary file which it should not be.

[[email protected]_NAME etc]# grep ".*" consul-ext.cfg
Binary file /etc/consul-ext.cfg matches

Using vi on the file we can see an errant ^@ character present which is breaking the activemq scripts attempt to read the file and find where the daservice.uuid file resides to read it.

Cause

The activemq service is failing to start due to the /etc/consul-ext.cfg file being corrupted with a null character.

Environment

All supported DX NetOps Performance Management releases

Resolution

Edit the /etc/consul-ext.cfg file. Remove the null characters and save the changes.

The FT DA should now start cleanly allowing it to successfully go Active.