SRM services stop due to unmanaged SRA log file accumulation
search cancel

SRM services stop due to unmanaged SRA log file accumulation

book

Article ID: 313050

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Unmanaged SRM or SRA logs within the appliance can fill up support partition disk. This results in failure of the SRM server service because there's no space left on the disk for the services to write logs. 

srm-server service stops unexpectedly 

vmware-dr.log: 

--> Panic: Scheduled function triggered exception:
--> NSt10filesystem7__cxx1116filesystem_errorE filesystem error: cannot create directory: No space left on device [/var/log/vmware/srm/SRAs/sha256_#####################################]
--> Backtrace:
--> [backtrace begin] product: VMware vCenter Site Recovery Manager, version: 9.0.2, build: build-24401761, tag: vmware-dr, cpu: x86_64, os: linux, buildType: release
--> backtrace[00] libvmacore.so[0x00252482]: Vmacore::System::Stacktrace::CaptureFullWork(unsigned int)
--> backtrace[01] libvmacore.so[0x0022B207]: Vmacore::System::SystemFactory::CreateBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
--> backtrace[02] libvmacore.so[0x0048CA7B]
--> backtrace[03] libvmacore.so[0x0048CB82]: Vmacore::PanicExit(char const*)
--> backtrace[04] libdr-storage.so[0x000EB55B]
--> backtrace[05] libdr-storage.so[0x0014CB40]
--> backtrace[06] libvmacore.so[0x003429CE]
--> backtrace[07] libvmacore.so[0x003442D2]
--> backtrace[08] libvmacore.so[0x00497DE0]
--> backtrace[09] libpthread.so.0[0x00008EB0]
--> backtrace[10] libc.so.6[0x000FFADF]
--> backtrace[11] (no module)
--> [backtrace end]

Log bundles cannot be generated

The SRA log partition is full. 

root@srm [ ~ ]# df -h
Filesystem                      Size  Used Avail Use% Mounted on
devtmpfs                        4.0M     0  4.0M   0% /dev
tmpfs                           4.9G   44K  4.9G   1% /dev/shm
tmpfs                           2.0G  856K  2.0G   1% /run
tmpfs                           4.0M     0  4.0M   0% /sys/fs/cgroup
/dev/sda4                        14G  5.5G  7.4G  43% /
tmpfs                           4.9G  488K  4.9G   1% /tmp
/dev/sda2                       238M   36M  190M  16% /boot
/dev/mapper/support_vg-support  3.9G  555M  3.2G  15% /opt/vmware/support
/dev/loop0                      378M  358M     0 100% /opt/vmware/support/logs/srm/SRAs

You may encounter the following error when accessing the Site Recovery Manager (SRM) user interface: Unable to retrieve pairs from extension server at https://####srm01.##.##.###:443/drserver/vcdr/vmomi/sdk.Unable to connect to Site Recovery Manager Server at https://###srm01.####.###.####:443/drserver/vcdr/vmomi/sdk. Reason: Unexpected status code: 503

This issue may occur if you are trying to add another SRA adapter to the existing SRM setup.


SRA might display errors in UI, "Unable to find SRA at the paired site", "Unable to find SRA with UUID"

For Ex:

Environment

VMware Site Recovery Manager (all versions)
VMware Live Site Recovery (all versions)

Cause

The SRM appliance has two main disks: system and support.

The support disk is partitioned into lvm ext4 filesystem & is used to store the logs of VMware-related services. The mount point of the partition is "/opt/vmware/support"

Some SRAs (e.g., PowerFlex, Dell) lack an inbuilt log rotation & compression software or don't have a limit on the size of logs they produce. This will result in the unplanned failure of the SRM server services due to the lack of any space available on disk for the log files. This may also result in a build up of 0-byte files

Resolution

This is a workaround and not a permanent fix. 

VMware uses a logrotate Linux package that provides a way to clean up existing files without removing them. The required tools will be installed on the appliance and is attached to this KB known as clean-sras-logs.sh

What will the script do ? 

This script will perform a manual clean up. After log rotation, the old logs will be stored in an archive until the next log rotation. The archive will be stored under "/opt/vmware/support/logs/srm/" folder.

First, the script will identify all of the subfolders that contain the SRAs logs & then create a configuration for logrotate to process these logs. Logrotate is then going to copy the logs to a folder outside of the SRA partition and truncate the original files without removing them.After that, all logs will be archived and stored on the main support partition.

How to install the script ?

1. Download the script clean-sras-logs.sh at the bottom of the article. You may have to rename the file back to clean-sras-logs.sh once it is downloaded
2. Using WinSCP or equivalent application connect to the SRM appliance with the admin login account and password.
3. Copy the downloaded file into the "/home/admin" folder
4. Close the WinSCP or equivalent application
5. Connect to the SRM appliance with an SSH application like putty and login as admin then su as root user
6. Copy the  place the script from the "/home/admin" folder to the "/opt/vmware/bin" location with the name clean-sras-logs.sh
7. Change permissions to make the script executable.   

chmod 755 /opt/vmware/bin/clean-sras-logs.sh

8. Manually execute the script to rotate the log files inside the SRAs log partition.

/opt/vmware/bin/clean-sras-logs.sh

How to enable cron job to execute SRAs log rotation everyday ? 

1. Login to the appliance as admin and su as root user, create an empty file inside "/etc/cron.d/" with the filename sras.cron:

touch /etc/cron.d/sras.cron


2. Fill the content with the below text using a text editor :

0 0 * * * root /bin/bash /opt/vmware/bin/clean-sras-logs.sh

You can check out ‘crontab’ in Linux with Examples to understand how to set these commands and work with them. 

3. Restart the crond service:

systemctl restart crond

How to disable cron job to execute SRAs log rotation everyday ? 

1. Login to the appliance as admin and su as root user, remove the file /etc/cron.d/sras.cron:

cd /etc/cron.d
rm sras.cron

2. Restart the crond service:

systemctl restart crond

How to remove the script ? 

1. Login to the appliance as admin and su as root user, remove the file /opt/vmware/bin/clean-sras-logs.sh

cd /opt/vmware/bin
rm clean-sras-logs.sh

Additional Information

Q. What is the default log rotation interval in SRM appliance and where is it located (path) ?
A. All log settings can be found here - /opt/vmware/srm/conf/vmware-dr.xml


If files are not being removed during the truncate due to SRA settings edit the "Template" section of the script from:

TEMPLATE='{
   su root root
   copytruncate
   olddir /opt/vmware/support/logs/srm/olddir
   notifempty
   missingok
   nocompress
   sharedscripts
   postrotate
      rm -f /opt/vmware/support/logs/srm/SRAs.tar.gz
      tar -cvzf /opt/vmware/support/logs/srm/SRAs.tar.gz /opt/vmware/support/logs/srm/olddir
      rm -rf /opt/vmware/support/logs/srm/olddir/*
   endscript
}'

To:

TEMPLATE='{
   su root root
   copytruncate
   olddir /opt/vmware/support/logs/srm/olddir
   nocreate
   notifempty
   missingok
   nocompress
   sharedscripts
   postrotate
      rm -f /opt/vmware/support/logs/srm/SRAs.tar.gz
      tar -cvzf /opt/vmware/support/logs/srm/SRAs.tar.gz /opt/vmware/support/logs/srm/olddir
      rm -rf /opt/vmware/support/logs/srm/olddir/*
   endscript
}'

Attachments

clean-sras-logs get_app