Unable to start Data Aggregator (DA) servers with "WARNING: ... Fault Tolerant setup and currently other DA has the execution privilege"

search cancel

Unable to start Data Aggregator (DA) servers with "WARNING: ... Fault Tolerant setup and currently other DA has the execution privilege"

book

Article ID: 228403

calendar_today

Updated On: 10-26-2023

Products

CA Performance Management Network Observability

Issue/Introduction

Unable to start FT DA servers, both dadaemon & activemq failing with error:

$ sudo service dadaemon start

Redirecting to /bin/systemctl start dadaemon.service

Job for dadaemon.service failed because the control process exited with error code. See "systemctl status dadaemon.service" and "journalctl -xe" for details.

Detailed error is:

Nov 16 10:08:19 DA1 dadaemon[30157]: WARNING: this is a Fault Tolerant setup and currently other DA has the execution privilege, the startup operation will be aborted

Nov 16 10:08:19 DA1 systemd[1]: dadaemon.service: control process exited, code=exited status=1

Nov 16 10:08:19 DA1 systemd[1]: Failed to start Data Aggregator.

Only the consul and consul-ext run, but with errors. When looking at the consul, we see the following repeated from the /var/log/messages on the primary DA:

Nov 16 10:18:40 DA1 consul: 2021-11-16T10:18:40.054+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: failed inserting node: Error while renaming Node ID: "acb371f7-8d8d-a24c-654e-464cb73a7afc": Node name DA1 is reserved by node 72ef1eb5-91ca-0457-c54f-bba63d0ba78b with name DA1 (172.XXX.XXX.XXX)"

Nov 16 10:18:40 DA1 consul: 2021-11-16T10:18:40.143+0800 [WARN] agent.server.fsm: EnsureRegistration failed: error="failed inserting node: Error while renaming Node ID: "acb371f7-8d8d-a24c-654e-464cb73a7afc": Node name DA1 is reserved by node 72ef1eb5-91ca-0457-c54f-bba63d0ba78b with name DA1 (172.XXX.XXX.XXX)"

Nov 16 10:18:45 DA1 consul: 2021-11-16T10:18:45.882+0800 [WARN] agent: Check is now critical: check=service:daservice

Environment

DX NetOps CAPM Release : 20.2 or later
Component : IM Data Aggregator

Resolution

Ensure SELinux is disabled.

Check /etc/hosts, ensure IP address/hostname is set.

Then restore primary DA to operational status by doing the following:

Ensure DA services are down (dadaemon & activeMQ), then stop consul and consul-ext on DA1, DA1-HA, and proxy
Start consul on proxy, and wait 60-90 sec
Start consul and consul-ext on DA1 and DA1-HA, check membership. For example, you should see something similar to:

$ cd /opt/CA/IMDataAggregator/consul/bin

$ ./consul members

Node Address Status Type Build Protocol DC Segment

DA1 XXX.XXX.XXX.1:8301 alive server 1.7.2 2 capm <all>

DA1-HA. XXX.XXX.XXX.2:8301 alive server 1.7.2 2 capm <all>

DA-PROXY XXX.XXX.XXX.3:8301 alive server 1.7.2 2 capm <all>

$ ./consul operator raft list-peers

Node ID Address State Voter RaftProtocol

DA-PROXY 7dc914d8-2959-9659-1da4-5d83fb074a1c XXX.XXX.XXX.3:8300 leader true 3

DA1-HA d0cad9d2-701a-e6b7-c7e7-40235e316a3a XXX.XXX.XXX.2:8300 follower true 3

DA1 acb371f7-8d8d-a24c-654e-464cb73a7afc XXX.XXX.XXX.1:8300 follower true 3
Then activate DA1 and monitor consul-ext.log for behaviour.
Let DA1 fully start. Now it is up and running with the secondary green but inactive:

With Fault Tolerant (FT) DAs, if everything is working you will have one DA fully running to include the activeMQ and dadaemon services and the other DA will show as inactive (green like above) but no services (other than the consul) will be running. FT is not two DAs running in tandem, it's one DA running fully and the other DA waiting to start services if a fail over is detected as needed.

Feedback

thumb_up Yes

thumb_down No