Rebuild a Fault Tolerant Data Aggregator pair when one server failed
book
Article ID: 410281
calendar_today
Updated On:
Products
Network ObservabilityCA Performance Management
Issue/Introduction
One of the DA FT cluster is crashed and need to rebuild in a new host. The immediate goal is to decommission the existing DA server that is down, and reinstall it in a new host with the same PM version.
Then reconnect to the existing DA cluster. Given that I have done the DA backup (config files), and I can restore all the DA config files, will this work ?
One of the Fault Tolerant (FT) Data Aggregator (DA) servers failed. We need to replace the failed FT DA host with a new host and rebuild the FT DA pair and it's communications.
How do we replace a failed DA when it's one of a FT DA pair?
Environment
All supported Network Observability DX NetOps Performance Management Data Aggregator releases
Cause
Server failed in a state requiring replacement instead of rebuild.
Resolution
Prevent the proxy from trying to restart active DA, and from starting the new DA post install.
Shut down consul-ext and consul services on the working active DA.
Shut down the consul services on the proxy host server
Leave the proxy servers daproxy service running. Do NOT shut that down.
Rebuild the failed DA by installing a new DA on the new host.
Ensure the new DA meets all necessary pre-requisites including appropriate port access and access to the shared data dir.
When installing the new DA ensure the correct answers are provided for the install.
Answer Yes to "Would you like to configure Data Aggregator with fault tolerance?"
Provide the proxy host name for "Data Aggregator proxy host :"
Specify the same shared data dir the working DA uses when asked for the path. Both FT DA's need to use the same shared data directory.
Provide it with the correct DR DB host name(s) when asked about the "Data repository server hostname/IP :".
Shut down the new DA's consul and consul-ext services only.
Regenerate the acl token via the bootstrap process with the steps from the following article. We should be able to continue from step 3 in the articles Solution field.
Once completed confirm using consul commands that we see the correct DA's and proxy listed.
Note that it can take some time after completing this process before the Data Aggregator table in the Portal System Status page shows the correct FT DA pair.
Additional Information
When installing the new second FT DA to replace the failed one, no special steps are needed related to it use of the shared data directory.
We do not override the contents of the shared data dir during the second DA install or upgrade. It should see that it's the second DA, not the first DA.
A key to this working is ensuring we run the DA install the same release as the remaining working DA.
It's likely post install the new DA will show problems related to the acl token validity.
Ignore that until the process to generate a new token is completed as part of the failed FT DA replacement process.