The controller UI/API may become inaccessible and return an error "error": "Bad Service". Some of the cluster nodes may be stuck in "Starting state"
Run below command in CLI to check the nodes status.
// SSH to Controller Leader Node
# shell
> show cluster nodes
Also check cluster_manager.INFO for the below logs on Controller node to confirm if the issue is caused due to Postgres replication failure
// tail the cluster_manager.INFO logs from Leader Node
SSH to Controller Leader Node
# cd /var/lib/avi/log
# sudo -i
# tail -f cluster_manager.INFO
INFO [cluster_node_manager._wait_for_leader_to_join:243] Waiting for leader to join...
INFO [cluster_node_manager._wait_for_leader_to_join:243] Waiting for leader to join...
INFO [cluster_node_manager._wait_for_leader_to_join:243] Waiting for leader to join...
INFO [cluster_node_manager._internal_join:273] Replication file was not written in the window REPLICATION_TIMESTAMP_TIMEOUT, so cannot set replication complete
This issue may occur on all cloud environments
The Postgres replication may fail due to
Check if replication_not_complete file is written.
SSH to Controller Nodes
# sudo -i
# find / -name *replication_not* -type f 2>&1 | grep -v find
Contact Broadcom Support for further assistance on this issue.
https://knowledge.broadcom.com/external/article/405686/how-to-create-a-wolken-case-for-avi-prod.html