While VCO DR replication is in progress, the Active and Standby VCOs report "Failure Background Import" error.
SDWAN Orchestrator version running earlier than 5.2.x.x
While doing VCO DR replication the database between Active and Standby VCO synchronizes. The standby does an API call to start the backup of the database on Active, so that the database can be synched to Standby.
The reason why this is failing is because the request (API call from Standby to Active) to create a clickhouse database backup is taking more than 2 minutes (default timeout is 120 seconds) to complete. Hence the API is timing out.
Example: Notice the following logs on the active
2022-03-31T19:04:24.602Z - info: [replication@velo~.net.164870930.82735] [18923] received request to create Clickhouse backup
2022-03-31T19:06:37.197Z - info: [replication@velo~.net.164870930.82735] [18923] Successfully created backup : /store3/clickhouse/shadow/1648753464215_backup/data/velocloud_stats
During the second attempt it took 133 seconds to generate the backup (not an hour). The error on the standby shows up exactly after 120 seconds as seen below,
2022-03-31T19:06:24.631Z - error: [process.handle-s~nfig.164853843.1652] [22853] Error calling /disasterRecovery/createClickhouseBackup on remote vco Error: socket hang up
Please add --http-server-default-timeout=600000 to /lib/systemd/system/ replication.service file like below on Active and DR VCOs.
ExecStart=/usr/bin/node --http-server-default-timeout=600000 replication.js
Once done please restart the Replication process
service replication restart