Unable to start FT DA servers, both dadaemon & activemq failing with error:
$ sudo service dadaemon start
Redirecting to /bin/systemctl start dadaemon.service
Job for dadaemon.service failed because the control process exited with error code. See "systemctl status dadaemon.service" and "journalctl -xe" for details.
Detailed error is:
Nov 16 10:08:19 DA1 dadaemon[30157]: WARNING: this is a Fault Tolerant setup and currently other DA has the execution privilege, the startup operation will be aborted
Nov 16 10:08:19 DA1 systemd[1]: dadaemon.service: control process exited, code=exited status=1
Nov 16 10:08:19 DA1 systemd[1]: Failed to start Data Aggregator.
Only the consul and consul-ext run, but with errors. When looking at the consul, we see the following repeated from the /var/log/messages on the primary DA:
Nov 16 10:18:40 DA1 consul: 2021-11-16T10:18:40.054+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: failed inserting node: Error while renaming Node ID: "acb371f7-8d8d-a24c-654e-464cb73a7afc": Node name DA1 is reserved by node 72ef1eb5-91ca-0457-c54f-bba63d0ba78b with name DA1 (172.XXX.XXX.XXX)"
Nov 16 10:18:40 DA1 consul: 2021-11-16T10:18:40.143+0800 [WARN] agent.server.fsm: EnsureRegistration failed: error="failed inserting node: Error while renaming Node ID: "acb371f7-8d8d-a24c-654e-464cb73a7afc": Node name DA1 is reserved by node 72ef1eb5-91ca-0457-c54f-bba63d0ba78b with name DA1 (172.XXX.XXX.XXX)"
Nov 16 10:18:45 DA1 consul: 2021-11-16T10:18:45.882+0800 [WARN] agent: Check is now critical: check=service:daservice
DX NetOps CAPM Release : 20.2 or later
Component : IM Data Aggregator
Ensure SELinux is disabled.
Check /etc/hosts, ensure IP address/hostname is set.
Then restore primary DA to operational status by doing the following:
$ cd /opt/CA/IMDataAggregator/consul/bin
$ ./consul members
Node Address Status Type Build Protocol DC Segment
DA1 XXX.XXX.XXX.1:8301 alive server 1.7.2 2 capm <all>
DA1-HA. XXX.XXX.XXX.2:8301 alive server 1.7.2 2 capm <all>
DA-PROXY XXX.XXX.XXX.3:8301 alive server 1.7.2 2 capm <all>
$ ./consul operator raft list-peers
Node ID Address State Voter RaftProtocol
DA-PROXY 7dc914d8-2959-9659-1da4-5d83fb074a1c XXX.XXX.XXX.3:8300 leader true 3
DA1-HA d0cad9d2-701a-e6b7-c7e7-40235e316a3a XXX.XXX.XXX.2:8300 follower true 3
DA1 acb371f7-8d8d-a24c-654e-464cb73a7afc XXX.XXX.XXX.1:8300 follower true 3
With Fault Tolerant (FT) DAs, if everything is working you will have one DA fully running to include the activeMQ and dadaemon services and the other DA will show as inactive (green like above) but no services (other than the consul) will be running. FT is not two DAs running in tandem, it's one DA running fully and the other DA waiting to start services if a fail over is detected as needed.