search cancel

Unable to start Data Aggregator (DA) servers with "WARNING: ... Fault Tolerant setup and currently other DA has the execution privilege"

book

Article ID: 228403

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

Unable to start FT DA servers, both dadaemon & activemq failing with error:

$ sudo service dadaemon start

Redirecting to /bin/systemctl start  dadaemon.service

Job for dadaemon.service failed because the control process exited with error code. See "systemctl status dadaemon.service" and "journalctl -xe" for details.

Detailed error is:

Nov 16 10:08:19 DA1 dadaemon[30157]: WARNING: this is a Fault Tolerant setup and currently other DA has the execution privilege, the startup operation will be aborted

Nov 16 10:08:19 DA1 systemd[1]: dadaemon.service: control process exited, code=exited status=1

Nov 16 10:08:19 DA1 systemd[1]: Failed to start Data Aggregator.

 

Only the consul and consul-ext run, but with errors. When looking at the consul, we see the following repeated from the /var/log/messages on the primary DA:

Nov 16 10:18:40 DA1 consul: 2021-11-16T10:18:40.054+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: failed inserting node: Error while renaming Node ID: "acb371f7-8d8d-a24c-654e-464cb73a7afc": Node name DA1 is reserved by node 72ef1eb5-91ca-0457-c54f-bba63d0ba78b with name DA1 (172.XXX.XXX.XXX)"

Nov 16 10:18:40 DA1 consul: 2021-11-16T10:18:40.143+0800 [WARN]  agent.server.fsm: EnsureRegistration failed: error="failed inserting node: Error while renaming Node ID: "acb371f7-8d8d-a24c-654e-464cb73a7afc": Node name DA1 is reserved by node 72ef1eb5-91ca-0457-c54f-bba63d0ba78b with name DA1 (172.XXX.XXX.XXX)"

Nov 16 10:18:45 DA1 consul: 2021-11-16T10:18:45.882+0800 [WARN]  agent: Check is now critical: check=service:daservice

Environment

DX NetOps CAPM Release : 20.2 or later
Component : IM Data Aggregator

Resolution

Ensure SELinux is disabled.

Check /etc/hosts, ensure IP address/hostname is set.

Then restore primary DA to operational status by doing the following:

  1. Ensure DA services are down (dadaemon & activeMQ), then stop consul and consul-ext on DA1, DA1-HA, and proxy

  2. Start consul on proxy, and wait 60-90 sec

  3. Start consul and consul-ext on DA1 and DA1-HA, check membership. For example, you should see something similar to:

    $ cd /opt/CA/IMDataAggregator/consul/bin

    $ ./consul members

    Node              Address              Status  Type    Build  Protocol  DC    Segment

    DA1               XXX.XXX.XXX.1:8301   alive   server  1.7.2  2         capm  <all>

    DA1-HA.           XXX.XXX.XXX.2:8301   alive   server  1.7.2  2         capm  <all>

    DA-PROXY          XXX.XXX.XXX.3:8301   alive   server  1.7.2  2         capm  <all>

    $ ./consul operator raft list-peers

    Node              ID                                    Address          State     Voter  RaftProtocol

    DA-PROXY     7dc914d8-2959-9659-1da4-5d83fb074a1c  XXX.XXX.XXX.3:8300    leader    true   3

    DA1-HA       d0cad9d2-701a-e6b7-c7e7-40235e316a3a  XXX.XXX.XXX.2:8300    follower  true   3

    DA1          acb371f7-8d8d-a24c-654e-464cb73a7afc  XXX.XXX.XXX.1:8300    follower  true   3



  4. Then activate DA1 and monitor consul-ext.log for behaviour.

  5. Let DA1 fully start. Now it is up and running with the secondary green but inactive:

With Fault Tolerant (FT) DAs, if everything is working you will have one DA fully running to include the activeMQ and dadaemon services and the other DA will show as inactive (green like above) but no services (other than the consul) will be running. FT is not two DAs running in tandem, it's one DA running fully and the other DA waiting to start services if a fail over is detected as needed.

 

Attachments