VCSA backups fail with timeout errors. You may also see errors indicating failure to create a directory.
You may see the following in the logs:
ERROR: sftp cmd failed. RC: 28, Err: curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received
Manual testing with curl does not present a problem with connection or listing directories.
At this time there is no resolution. You may attempt the workaround below.
Workaround:
In order to work around this issue, you could try manually editing one of the backup scripts to add a wait and retry as part of issuing commands should there be a timeout failure. This will not necessarily resolve the issue, but it may allow the backup to succeed until such time as the customer can investigate other possibilities regarding firewall or SFTP server settings. Note that this is not a permanent solution and should an upgrade be done, the workaround may need to be completed again.
To add a retry, first take a backup of the file located here in case the edits are done incorrectly and cause problems:
/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/plugins/FtpStorageIOLib.py
You will need to edit the file and make changes to two sections.
1. Find the following section near the top of the file:
import os
from urllib.parse import urlparse, unquote, urlunsplit
2. Add an import for the time module by inserting a line:
import os
import time
from urllib.parse import urlparse, unquote, urlunsplit
3. Add the retry to the _run_cmd method. Locate the following method in the file:
def _run_cmd(args, input=None, timeout=None):
'''Wrapper over run_cmd'''
try:
cmd_res = run_cmd(args, input=input, stdout=PIPE, stderr=PIPE,
timeout=timeout, universal_newlines=True)
except TimeoutExpired:
raise PluginError(ErrCodes.timeout)
return cmd_res
4. Update the method so that it looks as follows (Note that in Python, spaces are used to indent lines. Keep the formatting the same):
def _run_cmd(args, input=None, timeout=None):
'''Wrapper over run_cmd'''
try:
cmd_res = run_cmd(args, input=input, stdout=PIPE, stderr=PIPE,
timeout=timeout, universal_newlines=True)
if cmd_res.returncode == 28:
raise PluginError(ErrCodes.timeout)
except:
time.sleep(5)
try:
cmd_res = run_cmd(args, input=input, stdout=PIPE, stderr=PIPE,
timeout=timeout, universal_newlines=True)
except:
raise PluginError(ErrCodes.timeout)
return cmd_res
5. Save and close the file, then restart the appliance management service:
service-control --restart applmgmt
6. Test the backup.