SRM services stop due to unmanaged SRA log files

Products

VMware Live Recovery

Issue/Introduction

Unmanaged SRM or SRA logs within the appliance can fill up support partition disk. This results in failure of the SRM server service because there's no space left on the disk for the services to write logs.

1. srm-server service stops unexpectedly

vmware-dr.log:

--> Panic: Scheduled function triggered exception:
--> NSt10filesystem7__cxx1116filesystem_errorE filesystem error: cannot create directory: No space left on device [/var/log/vmware/srm/SRAs/sha256_56c870d9f776420fb76c23c045cd3dc3d27582de6779c7e0e6b2341f3c04e500]
--> Backtrace:
--> [backtrace begin] product: VMware vCenter Site Recovery Manager, version: 9.0.2, build: build-24401761, tag: vmware-dr, cpu: x86_64, os: linux, buildType: release
--> backtrace[00] libvmacore.so[0x00252482]: Vmacore::System::Stacktrace::CaptureFullWork(unsigned int)
--> backtrace[01] libvmacore.so[0x0022B207]: Vmacore::System::SystemFactory::CreateBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
--> backtrace[02] libvmacore.so[0x0048CA7B]
--> backtrace[03] libvmacore.so[0x0048CB82]: Vmacore::PanicExit(char const*)
--> backtrace[04] libdr-storage.so[0x000EB55B]
--> backtrace[05] libdr-storage.so[0x0014CB40]
--> backtrace[06] libvmacore.so[0x003429CE]
--> backtrace[07] libvmacore.so[0x003442D2]
--> backtrace[08] libvmacore.so[0x00497DE0]
--> backtrace[09] libpthread.so.0[0x00008EB0]
--> backtrace[10] libc.so.6[0x000FFADF]
--> backtrace[11] (no module)
--> [backtrace end]

2. Log bundles cannot be generated

3. The SRA log partition is full.

root@srm [ ~ ]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 4.9G 44K 4.9G 1% /dev/shm
tmpfs 2.0G 856K 2.0G 1% /run
tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup
/dev/sda4 14G 5.5G 7.4G 43% /
tmpfs 4.9G 488K 4.9G 1% /tmp
/dev/sda2 238M 36M 190M 16% /boot
/dev/mapper/support_vg-support 3.9G 555M 3.2G 15% /opt/vmware/support
/dev/loop0 378M 358M 0 100% /opt/vmware/support/logs/srm/SRAs

Environment

VMware Site Recovery Manager
VMware Live Site Recovery

Cause

The SRM appliance has two main disks: system and support.

The support disk is partitioned into lvm ext4 filesystem & is used to store the logs of VMware-related services. The mount point of the partition is "/opt/vmware/support"

Some SRAs (Storage Replication Adapter) lack an inbuilt log rotation & compression software or don't have a limit on the size of logs they produce. This will result in the unplanned failure of the SRM server services due to the lack of any space available on disk for the log files.

Resolution

This is a workaround and not a permanent fix.

VMware uses a logrotate Linux package that provides a way to clean up existing files without removing them. The required tools will be installed on the appliance and is attached to this KB known as clean-sras-logs.sh

What will the script do ?

This script will perform a manual clean up. After log rotation, the old logs will be stored in an archive until the next log rotation. The archive will be stored under "/opt/vmware/support/logs/srm/" folder.

First, the script will identify all of the subfolders that contain the SRAs logs & then create a configuration for logrotate to process these logs. Logrotate is then going to copy the logs to a folder outside of the SRAs partition and truncate the original files without removing them.After that, all logs will be archived and stored on the main support partition.

How to install the script ?

1. Download the script clean-sras-logs.sh at the bottom of the article.
2. Login to the appliance as admin and su as root user, place the script in "/opt/vmware/bin" location with the name clean-sras-logs.sh
3. Change permissions to make the script executable.

chmod 755 /opt/vmware/bin/clean-sras-logs.sh

4. Manually execute the script to rotate the log files inside the SRAs log partition.

/opt/vmware/bin/clean-sras-logs.sh

How to enable cron job to execute SRAs log rotation everyday ?

1. Login to the appliance as admin and su as root user, create an empty file inside "/etc/cron.d/" with the filename sras.cron:

touch /etc/cron.d/sras.cron

2. Fill the content with the below text using a text editor :

0 0 * * * root /bin/bash /opt/vmware/bin/clean-sras-logs.sh

You can check out ‘crontab’ in Linux with Examples to understand how to set these commands and work with them.

3. Restart the crond service:

systemctl restart crond

How to disable cron job to execute SRAs log rotation everyday ?

1. Login to the appliance as admin and su as root user, remove the file /etc/cron.d/sras.cron:

cd /etc/cron.d
rm sras.cron

2. Restart the crond service:

systemctl restart crond

How to remove the script ?

1. Login to the appliance as admin and su as root user, remove the file /opt/vmware/bin/clean-sras-logs.sh

cd /opt/vmware/bin
rm clean-sras-logs.sh

Additional Information

Q. What is the default log rotation interval in SRM appliance and where is it located (path) ?
A. All log settings can be found here - /opt/vmware/srm/conf/vmware-dr.xml

Attachments

clean-sras-logs get_app