Fault Tolerant Data Aggregator proxy services fail
search cancel

Fault Tolerant Data Aggregator proxy services fail

book

Article ID: 215566

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

After upgrading, only one Data Aggregator (DA) appears under system status in the Netops Portal, or neither of the Fault-Tolerant pair does.

After upgrade of DX NetOps Performance Management, Data Aggregator System Status shows Failed for Fault-Tolerant (FT) Data Aggregator

After an outage due to vmotion activity the rebuilt snapshots of the Data Aggregator and proxy hosts fail to connect. The consul services fail to stay running.

Consul logs are mentioning it needs to be re-bootstrapped, how do we do that?

Environment

All supported DX NetOps Performance Management releases

Resolution

  1. Stop both DAs and consul services. Try the maintenance command first.


    /opt/IMDataAggregator/scripts/dadaemon maintenance;

    If this doesn’t stop the DA after 5 minutes, use systemctl

    systemctl stop dadaemon;
    systemctl stop activemq;


    And then stop the consul services  

    systemctl stop consul;
    systemctl stop consul-ext;

  2. On the proxy host, stop the da-proxy and consul services.

    systemctl stop consul;

    systemctl stop daproxy; 

    Then start the daproxy service;

    systemctl start daproxy;

  3. Rename the consul data directory on all three hosts (DAs and proxy). The default location on DA is:

    /opt/IMDataAggregator/consul/data


    So

    mv /opt/IMDataAggregator/consul/data /opt/IMDataAggregator/consul/data.old


    The default location on the Proxy is:  

    /opt/CA/daproxy/data

    So run;

    mv /opt/CA/daproxy/data /opt/CA/daproxy/data.old

     

  4. Start consul service on all three hosts.

    systemctl start consul   

  5. Confirm a leader has been selected by consul by running the following on both DAs and proxy: 

    curl http://127.0.0.1:8500/v1/status/leader


    It should return the proxy host. If it returns "", there is no leader. So you should do "systemctl status consul" on DA proxy and DAs to see if a consul is having issues.

    Resolve the issue/error, restart the consul and check the leader URL.

  6. If ACL was enabled and bootstrapped, the old ACL master token will become invalid once all the data directories are deleted, so we must bootstrap ACL again. We will be making an HTTP PUT request to one of the DA consul servers. 

    curl -v -X PUT 'Content-Type: application/xml' http://DA-HOST:8500/v1/acl/bootstrap


    This request will then return a SecretId, which will be the new consul acl master token.

  7. Update the acl-token.properties file in the DA shared-repo with this new token. Change the directory to your shared repo. This will be named whatever you chose during the installation. Then;

    cp acl-token.properties acl-token.properties.original

    vi acl-token.properties

    Replace the old token with the new token from the SecretId value from the step 6 command results.

     
  8. Verify that you can now see all members on all nodes.

    /opt/IMDataAggregator/consul operator raft list-peers -token=cb37db60-0088-70bd-092c-36f2e507c406 (replace with your token)