Resolving Redis/Valkey "READONLY" Errors and Connection Exhaustion in Tanzu Platform
search cancel

Resolving Redis/Valkey "READONLY" Errors and Connection Exhaustion in Tanzu Platform

book

Article ID: 436836

calendar_today

Updated On:

Products

VMware Tanzu Platform - Cloud Foundry VMware Tanzu for Valkey Support Only for Redis VMware Tanzu Redis Redis for VMware Tanzu

Issue/Introduction

Applications bound to a Redis/Valkey service instance are intermittently failing to perform write operations, which results in an InvalidDataAccessApiUsageException.

Observed error:

org.springframework.dao.InvalidDataAccessApiUsageException: READONLY You can't write against a read only replica.

Concurrently, infrastructure monitoring shows that the service-metrics process on the Redis Master node is failing. Logs for the metrics process indicate:

ERR max number of clients reached

Environment

Elastic Application Runtime

Tanzu Platform for Cloud Foundry

Valkey on Cloud Foundry

Cause

You might see an output similar to the snippet below.The default Max Client limit in the Valkey tile is set to 1000; if this limit is reached, you will encounter the error mentioned above.

This can be validated using the redis CLI after SSHing into the Redis nodes. If you have multiple Redis nodes (i.e., in an HA setup), you will need to log in to all three to identify the master node. The steps to do this are as follows:

1) SSH into the Redis nodes.

bosh -d service-instance_uuid ssh redis/<index>

2) Open the file `cat /var/vcap/jobs/redis/config/redis.conf` and grab the values for `masterauth` and `replica-announce-ip`.

3) Connect to the Redis instance using `redis-cli`.

/var/vcap/packages/redis/bin/redis-cli -h <hostname-######>.redis-instance.infra.service-instance-uuid.bosh -p 16379 --cacert /var/vcap/jobs/redis/etc/ca/redis_ca.crt --tls

Note: If TLS is not enabled, you can remove the tls option.

4) Once connected to the instance, you can authenticate using the command below.

AUTH <value retrieved from masterauth>

5) Check the commands below to determine if the Redis node is a master or slave and to find its connection information.

info replication

info clients

You might see an output similar to the below snippet:

connected_clients:998
cluster_connections:0
maxclients:1000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer: 20504
blocked_clients:0
tracking_clients:0
pubsub clients:3
watching_clients:0
clients_in_timeout_table:0
total_watched_keys:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0
paused_reason:none
paused_actions:none
paused_timeout_milliseconds:0

Resolution

If `connected_clients` is almost equal to the value of `maxclients`, we might encounter the issue described in the Introduction section.

There are two ways to resolve this:

1) Checking the jedis pool parameter and tuning it.


You can check for any specific application that is exhausting this limit by identifying the most frequent source IP connected to the Redis master node using the netstat command:

netstat -anp | grep :6379 | grep ESTABLISHED | awk '{print $5) | cut -d: -fl | sort uniq -c | sort -nr

Once we obtain the Diego cell, we can then track it back to the corresponding application GUID, locate the application, and check the Jedis pool configuration. A sample snippet follows:

spring:
  #Redis configuration
  redis:
    timeout: 3600
    ssl: false # Enable SSL support.
    jedis:
      pool:
        max-active: 1000
        max-idle: 500
        max-wait: 5000
        min-idle: 250

You can reduce the max-active, max-idle, and min-idle parameters accordingly.

2) Increase the max_clients parameter in the Valkey tile under the On-demand plan settings section. The default is 1000, but this can be increased to 10000 if necessary.






 

Additional Information

As of the latest version of Valkey (10.1.1 and 10.2.2), the maximum value we can set for the max clients parameter is 10,000. However, future releases will include an option to increase this limit if necessary.