Configuration local and remote backups stop working and files are not being created.
The backup has a timestamp column on the page as well as in the file name. If the backup file timestamp is not accurate per the configuration frequency and the actual date then this is a symptom of this issue.
You can further confirm with the error in the logs below:
('Failed: https://localhost/login Status Code 429 msg {"error":"Lock acquisition timeout"}', <Response [429]>)
Exception in /var/lib/avi/log/scheduler.log
You will also see error message "lock acquisition timeout" in /var/lib/avi/log/apiserver* file(s).
Other Symptom:
The GUI/CLI/API will be unavailable and results in HTTP 500 internal server error.
GUI: Login Page
CLI:
admin@controller:~$ shell
Login: admin
Password:
The controller at ['https://localhost'] is not active. If this is not the controller address, please restart the shell using the --address flag to specify the address of the controller.
Unknown error
The controller at ['https://localhost'] is not active. If this is not the controller address, please restart the shell using the --address flag to specify the address of the controller.
Unknown error
The controller at ['https://localhost'] is not active. If this is not the controller address, please restart the shell using the --address flag to specify the address of the controller.
Unknown error
Affected Version 30.1.x, 30.2.1 and 30.2.2
This issue was conformed to be a corner case issue (race condition) where session cleanup jobs and the API based login flows create a deadlock resulting in subsequent API failures with error "Lock acquisition timeout" in heavy scale setups.
Fix Available in 30.2.3-2p1 and 31.1.1 and future GA release
Workaround:
Run below command to restart services in each controller node sequentially:
systemctl stop apiserver.service
systemctl stop authserver.service
systemctl stop aviportaljobmanager.service
systemctl stop aviportal.service