Issue with Autobackup Generation Post Cluster Leadership Change or Quorum Member Restart
search cancel

Issue with Autobackup Generation Post Cluster Leadership Change or Quorum Member Restart

book

Article ID: 394099

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

 

  • In the Controller events, you may observe errors such as "Could not copy /var/lib/avi/backups/backup_Default-Scheduler.json", which indicate a failure in copying the backup file to the follower node.

 

 

  • In /var/lib/avi/log/cluster_manager.INFO, you may observe a recent cluster leader change or a restart of a quorum member, indicated by logs similar to the example below.
[2025-03-14 07:25:41,835] INFO [cluster_quorum_manager.evaluate_membership:382] [QUORUM] [LEADER_CHANGE] Leader node2.controller.local has changed to node3.controller.local
[2025-03-14 07:25:41,835] INFO [cluster_quorum_manager.evaluate_membership:398] [QUORUM] [MEMBERSHIP_CHANGE] Active nodes ['node1.controller.local', 'node3.controller.local'] Leader node3.controller.local
[2025-03-14 07:25:41,835] INFO [cluster_quorum_manager.evaluate_membership:401] [QUORUM] [MEMBERSHIP_CHANGE] Notifying member change callback
[2025-03-14 07:25:41,835] INFO [cluster_node_manager._internal_member_change_callback:317] [QUORUM] [MEMBERSHIP_CHANGE] Internal membership callback active_nodes ['node1.controller.local', 'node3.controller.local'] leader node3.controller.local
[2025-03-14 07:25:41,835] INFO [cluster_node_manager._process_member_change:325] [QUORUM] [JOIN_CLUSTER] Member change: ['node1.controller.local', 'node3.controller.local'] node3.controller.local

 

  • On /var/lib/avi/log/cluster_manager.INFO you will notice that aviscp_server service failure after the cluster failure with an error "Failure policy could not be determined from the existing configuration".
[2025-03-14 11:25:09,685] INFO [cluster_manager.process_event:196] Received RPC to process event with name= aviscp_server, status= failed
[2025-03-14 11:25:09,686] WARNING [process_supervisor.local_process_event:753] ProcSupervisor: local proc aviscp_server failed
[2025-03-14 11:25:09,686] WARNING [local_process.handle_failure:451] Handling failure for process aviscp_server:0
[2025-03-14 11:25:09,835] ERROR [local_process.process_event:190] Event failed for process aviscp_server had an error. "Failure policy could not be determined from the existing configuration"
Traceback (most recent call last):
  File "/opt/avi/python/lib/avi/infrastructure/clustering/local_process.py", line 188, in process_event
    process_instance.event(event_name, suppress_event)

Cause

After a cluster leader change or a quorum member restart in avi, you might encounter issues where backup files are not successfully transferred (SCPed) to other controller cluster members. This problem arises due to bug in configured failure policies for the aviscp_server service on cluster follower nodes.

Resolution

Temporary Workaround:

Restart the 'supervisor' service on the node where the 'aviscp_server' service has failed.

systemctl restart process-supervisor.service

 

Permanent Fix:

Perform the upgrade to the following versions where the fix has been applied:

30.2.122.1.6