Redis for VMware Tanzu Application Service Dedicated Instance Fails to Persist to Disk
search cancel

Redis for VMware Tanzu Application Service Dedicated Instance Fails to Persist to Disk

book

Article ID: 292938

calendar_today

Updated On:

Products

Redis for VMware Tanzu

Issue/Introduction

Symptoms:

When a Redis for VMware Tanzu Application Service instance fails to persist to disk, it could happen in two different ways:

When upgrading Redis, the upgrade fails because the installation process is unable to stop Redis.

- Any other operation (Except a read error) returns a MISCONF error. Depending on the implementation, this might stop any bound apps from working correctly.

Error message:

MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. 
Commands that may modify the data set are disabled. 
Please check Redis logs for details about the error. 

 

Environment


Cause

This problem occurs when Redis has used up all of the available disk space. There are two ways that Redis persists to disk:

  • appendonly: This represents the latest dataset and gets rewritten periodically to decrease its size.
  • dump.rdb: It functions as a periodic snapshot for the dataset.

In circumstances where Redis experiences a high rate of data modifications, the appendonly.aof file might occupy too much space, causing any further persistence attempts to fail.

 

Resolution

First, verify that you can write data to the persistent disk. If there is an event such as a storage outage, it may be the case that the disk has entered read only mode to prevent data corruption. To test this, you can try the following:

  1. cd /var/vcap/store/
  2. touch file.txt
  3. If the file is created, the disk itself is fine. However, if you see an error message saying that the disk is in a read only state, this means that your disk is unable to be written to.

In this scenario, you may be able to resolve simply by running a bosh recreate of the VM. This should create a new VM and reattach the disk, which should put it back to a read/write state.

If this disk itself is writable but you still face the error, your persistent disk may be full. To check this:

  • Run df -h to see the current usage /var/vcap/store. If this is showing 100% (or close to it):
  • Run ls -la /var/vcap/store a few times. This will show a "temp-dump.rdb" file growing in size and being deleted repeatedly as Redis keeps trying and failing to persist to disk.

Case 1: Where potential data loss and downtime are not an issue

The resolution, in this case, involves forcefully stopping Redis, releasing the space, and restarting Redis.

The steps are:

  1. sudo su
  2. monit unmonitor redis
  3. pkill -9 redis-server
  4. rm /var/vcap/store/redis/appendonly.aof
  5. rm /var/vcap/store/redis/dump.rdb
  6. monit monitor redis
  7. As soon as possible, modify Resource Config in the Redis tile to a persistent disk of at least 2.5 times the size of the allocated RAM for the Dedicated Node, as described in the docs here

You may wish to also remove /var/vcap/store/redis/dump.rdb. Please do this with extreme caution. Deleting this file means that when Redis starts up again, it will have no data whatsoever.

Once running, you can restore the data from a previously created snapshot. Please see our restore documentation.

Case 2: Data loss or downtime is unacceptable

The resolution in this case involves temporarily pointing Redis to a new location, releasing the space in the original location, and re-pointing Redis to the original persistent location. Redis remains online the whole time.

  1. Using the IAAS console/API, create a new persistent disk, LARGE_DISK of at least 3.5 times the current memory size.
  2. Attach LARGE_DISK to the service instance VM
  3. Identify the name of the new volume with lsblk
  4. Create a mount directory with sudo mkdir /{mount-dir}
  5. Check if a file system exists on the new volume with sudo file -s {volume-name}
  6. If a file system does not exist, create one on the volume with sudo mkfs -t ext4 {volume-name}
  7. Mount the volume over this directory with sudo mount {volume-name} {mount-dir}
  8. Ensure {mount-dir} is writeable: chmod 777 {mount-dir}
  9. Make a note of the following, from /var/vcap/store/redis/redis.conf:

 Redis password

 cat /var/vcap/store/redis/redis.conf | grep requirepass

 Config command alias

cat /var/vcap/stored/redis/redis.conf | grep "CONFIG"

The current location  where Redis persists files,  original- location

/var/vcap/packages/redis/bin/redis-cli -a {password} <config-command-alias> GET dir

10. Set the location for Redis to persist files to the newly mounted disk:                                                                                                                                                                                               /var/vcap/packages/redis/bin/redis-cli -a {password} {config-command-alias} SET dir {mount-dir}

11. Perform a successful persist to disk:

/var/vcap/packages/redis/bin/redis-cli -a {password} save

/var/vcap/packages/redis/bin/redis-cli -a {password} bgrewriteaof 

12. Run:

watch '/var/vcap/packages/redis/bin/redis-cli -a {password} INFO | grep aof_rewrite_in_progress' 

Until it displays 'aof_rewrite_in_progress:0'"

13. Clean up original_location:

rm {original-location}/appendonly.aof

rm {original-location}/dump.rdb


14. Redeploy the Redis Tile with the disk/memory ratios described above.

15. Set the location for Redis persistence back to the original one:

/var/vcap/packages/redis/bin/redis-cli -a {password} {config-command-alias} SET dir {original-location} 

16. Perform a successful persist:

/var/vcap/packages/redis/bin/redis-cli -a {password} save

/var/vcap/packages/redis/bin/redis-cli -a {password} bgrewriteaof 

17.Run:

watch '/var/vcap/packages/redis/bin/redis-cli -a {password} INFO | grep aof_rewrite_in_progress' 

Until it displays 'aof_rewrite_in_progress:0'"

18. Unmount, detach and delete LARGE_DISK.

19. This should have been done in step 14. If you haven't yet, modify Resource Config in the Redis tile to a persistent disk of at least 3.5 times the size of the allocated RAM for the Dedicated Node, as described in the docs here