Fault Tolerant Data Aggregators fail to start post DX NetOps Performance Management upgrade

book

Article ID: 210943

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

It appears that both DA's believe that other DA is in control so neither will start.

The Proxy is showing this in the /var/log/messages where it's unable to find or select a cluster leader.

Mar 17 16:08:23 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:08:23.904Z [INFO]  agent.server: Adding LAN server: server="DA1_HOST_NAME (Addr: tcp/DA1_IP-Address:8300) (DC: capm)"
Mar 17 16:08:23 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:08:23.905Z [INFO]  agent.server: Existing Raft peers reported by server, disabling bootstrap mode: server=DA2_HOST_NAME
Mar 17 16:08:23 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:08:23.906Z [INFO]  agent.server: Adding LAN server: server="DA2_HOST_NAME (Addr: tcp/DA2_IP-Address:8300) (DC: capm)"
Mar 17 16:08:23 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:08:23.907Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: DA2_HOST_NAME.capm DA2_IP-Address
Mar 17 16:08:23 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:08:23.907Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: DA1_HOST_NAME.capm DA1_IP-Address
Mar 17 16:08:23 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:08:23.908Z [INFO]  agent.server: Handled event for server in area: event=member-join server=DA2_HOST_NAME.capm area=wan
Mar 17 16:08:23 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:08:23.908Z [INFO]  agent.server: Handled event for server in area: event=member-join server=DA1_HOST_NAME.capm area=wan
Mar 17 16:08:39 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:08:39.406Z [ERROR] agent: Coordinate update error: error="No cluster leader"
Mar 17 16:08:58 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:08:58.669Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
Mar 17 16:09:05 Proxy_HOST_NAME consul[16301]: 2021-03-17T16:09:05.000Z [ERROR] agent: Coordinate update error: error="No cluster leader"

Cause

While unable to determine a cause, multiple restarts allowed it to connect when things were restarted in the correct order while removing the appropriate data directories so they are rebuilt.

Environment

All supported DX NetOps Performance Management releases

Resolution

To resolve this the following steps were taken. It was done four times and the fourth attempt allowed the systems to start properly.

  1. Stop any running dadaemon or activemq services on DA 1.
  2. Stop the consul and consul-ext services on DA 1.
  3. Delete or rename the existing data directories in the following locations on DA 1. Default paths shown.
    1. Recommendation is to renaming so logs if needed are retained. Can be deleted at a later date. To do so use the command:
      • mv data data.old
    2. The directories are:
      • /opt/IMDataAggregator/apache-karaf-<version>/data
      • /opt/IMDataAggregator/consul/data
  4. Stop any running dadaemon or activemq services on DA 2.
  5. Stop the consul and consul-ext services on DA 2.
  6. Delete or rename the existing data directories in the following locations on DA 1. Default paths shown.
    1. Recommendation is to renaming so logs if needed are retained. Can be deleted at a later date. To do so use the command:
      • mv data data.old
    2. The directories are:
      • /opt/IMDataAggregator/apache-karaf-<version>/data
      • /opt/IMDataAggregator/consul/data
  7. Stop the running consul and daproxy services on the proxy host.
  8. Delete or rename the existing data directory in the following location on the proxy host. Default paths shown.
    1. Recommendation is to renaming so logs if needed are retained. Can be deleted at a later date. To do so use the command:
      • mv data data.old
    2. The directory is:
      • /opt/CA/daproxy/data
  9. Start the daproxy service on the proxy host.
  10. Start the consul service on the proxy host.
  11. Start the consul service on DA 1.
  12. Start the consul-ext service on DA 1.
  13. Start the consul service on DA 2.
  14. Start the consul-ext service on DA 2.