Follower node in an Avi controller cluster is stuck in 'Starting' state.

Products

VMware Avi Load Balancer

Issue/Introduction

Follower node in the cluster is stuck in 'Starting' state and unable to the join the cluster. The cluster HA is compromised.

Environment

All Environments

Cause

On the leader node, we see the following error in the cluster_manager.INFO logs under /var/lib/avi/log:

[2026-xx-xx hh:mm:ss,055] INFO [cluster_quorum_manager.member_probe:521] [Member Probe][node2.controller.local] Checking compatibility status, Node Join Key: True, Last connectivity status: False
[2026-xx-xx hh:mm:ss,092] WARNING [cluster_quorum_manager.member_probe:538] ^[[33m[Member Probe][node2.controller.local] Failure while getting the clustify information. Error: failed to get clustify info for node node2.controller.local with error <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAUTHENTICATED
        details = "failed to authenticate"
        debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.x:5443 {grpc_message:"failed to authenticate", grpc_status:16, created_time:"2026-xx-xxThh:mm:ss.0919709+00:00"}"

On the follower node with the issue, we see the following error in clustify.INFO logs under /var/lib/avi/log:

2026-xx-xxThh:mm:ss.340Z        E  4252         segrpcauthserver/interceptor.go:175    failed to parse grpc token: %vToken is expired
2026-xx-xxThh:mm:ss.349Z        E  4252         segrpcauthserver/interceptor.go:67     failed to authenticate: failed to parse grpc token

The time on the leader node was off by more than 3 minutes from the correct time. The validity of the gRPC token used for authentication is set to 3 minutes. Since, the leader node time was off by more than 3 minutes, the token is considered to be expired and the authentication fails.

Resolution

Correct the time on the leader node and all other nodes in the cluster to the correct time.

You can use the following command to check if the time is synchronized with NTP server: It displays the list of NTP servers along with various parameters. A * preceding the name of NTP server in the output indicates that time is synced with that NTP server.

> root@10-x-x-x-x:~# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
===============================================================================
*ntp1.example.com .GPS.           1 u   18   64  377     0.421   -0.12    0.05
+ntp2.example.com .GPS.           1 u   20   64  377     0.435    0.08    0.06

Additional Information

If the time is not synchronized to NTP server, you can try the following steps:

1. Check if the ntp service is up

> systemctl status ntp

2. Check if the NTP server is reachable and listening on the appropriate port.

> ping <ntp-server-ip>
> nc -vzu <ntp-server-ip> 123

3. Try to forcibly synchronize the time with a specific NTP server. The command <ntpd -gq ntp_server_ip> is used to immediately synchronize system time once, even if the clock offset is large, and then exit. Stop the ntp service and then run the command to forcibly synchronize the time with the NTP server and then start the ntp service.

> systemctl stop ntp
> ntpd -gq <ntp-server-ip>
> systemctl start ntp