Redis pods present issues trying to come up after upgrade.
This issue can affect any pods or workloads that are sensitive to time synchronization.
The issue was seen in an environment where Redis is running on TKGi.
This issue can be caused by ports being blocked on UDP or TCP to the NTP server. A symptom to determine if this is the cause of the issue is time drift. When ports 123 or 1023 are blocked from communicating with NTP server, this can cause time differences between the worker nodes and pods.
Redis checks time sync between nodes during startup. The time differences can eventually cause the Redis failure. Some of the error messages found can be similar to the following:
2025-07-18 18:28:24,187 INFO bootstrap MainThread: Sending a new node join request to the master [email protected], validate_only: False
2025-07-18 18:28:24,205 INFO bootstrap MainThread: Node join response received. Status code: 406
2025-07-18 18:28:24,205 INFO bootstrap MainThread: Node join response error code: time_not_sync
2025-07-18 18:28:24,205 WARNING bootstrap_mgr MainThread: Bootstrap failed: [time_not_sync][System time is not synchronized]
On the Kubernetes side you can find scenarios similar to the following:
One or more Redis pods will show restarts.
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
redis-0 1/2 Running 1 (3m30s ago) 9m40s
redis-1 2/2 Terminating 2 (44m ago) 57m
redis-2 2/2 Running 0 59m
redis-services-rigger 1/1 Running 0 124m
redis-enterprise 2/2 Running 0 57m
When describing the pods that present restarts you can see under the events section events logged similar to the following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 11m kubelet Created container bootstrapper
Normal Started 11m kubelet Started container bootstrapper
Warning Unhealthy 11m kubelet Readiness probe failed: /opt/redislabs/bin/python3: can't open file '/opt/redislabs/shared/health_check.py': [Errno 2] No such file or directory
Warning Unhealthy 6m24s (x35 over 11m) kubelet Readiness probe failed: node id file does not exist - pod is not yet bootstrapped
Checking chronyc you can also see scenarios similar to the following:
System clock synchronized with a value of "no" indicates that the clock is not in sync.
A Reach value of 0 means that the chronyd did not get any valid responses from the NTP server. For more details on the Reach value please review the official Chrony documentation.
The documentation indicates that this may be related to firewall port blocking.
Review firewall rules that manage traffic to the NTP server and verify that ports 123 and 1023 are open for UDP and TCP traffic.