VCSA backups to SFTP fail intermittently with timeout errors
search cancel

VCSA backups to SFTP fail intermittently with timeout errors

book

Article ID: 316344

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:

VCSA backups fail with timeout errors.  You may also see errors indicating failure to create a directory.

You may see the following in the logs:

ERROR: sftp cmd failed. RC: 28, Err: curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received

Manual testing with curl does not present a problem with connection or listing directories.


Environment

VMware vCenter Server 8.0.x
VMware vCenter Server 7.0.x

Cause

The SFTP server may not be responsive to multiple successive commands due to a rate limit or other firewall rule in the environment.

Resolution

At this time there is no resolution.  You may attempt the workaround below.

Workaround:

In order to work around this issue, you could try manually editing one of the backup scripts to add a wait and retry as part of issuing commands should there be a timeout failure.  This will not necessarily resolve the issue, but it may allow the backup to succeed until such time as the customer can investigate other possibilities regarding firewall or SFTP server settings.  Note that this is not a permanent solution and should an upgrade be done, the workaround may need to be completed again.

To add a retry, first take a backup of the file located here in case the edits are done incorrectly and cause problems:

/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/plugins/FtpStorageIOLib.py

You will need to edit the file and make changes to two sections.

 

1.  Find the following section near the top of the file:

import os
from urllib.parse import urlparse, unquote, urlunsplit

2.  Add an import for the time module by inserting a line:

import os
import time
from urllib.parse import urlparse, unquote, urlunsplit

3.  Add the retry to the _run_cmd method.  Locate the following method in the file:

def _run_cmd(args, input=None, timeout=None):
    '''Wrapper over run_cmd'''
    try:
        cmd_res = run_cmd(args, input=input, stdout=PIPE, stderr=PIPE,
                          timeout=timeout, universal_newlines=True)
    except TimeoutExpired:
        raise PluginError(ErrCodes.timeout)
    return cmd_res

4.  Update the method so that it looks as follows (Note that in Python, spaces are used to indent lines.  Keep the formatting the same):

def _run_cmd(args, input=None, timeout=None):
    '''Wrapper over run_cmd'''
    try:
        cmd_res = run_cmd(args, input=input, stdout=PIPE, stderr=PIPE,
                          timeout=timeout, universal_newlines=True)
    if cmd_res.returncode == 28:
        raise PluginError(ErrCodes.timeout)
    except:
        time.sleep(5)
    try:
         cmd_res = run_cmd(args, input=input, stdout=PIPE, stderr=PIPE,
                          timeout=timeout, universal_newlines=True)
    except:
        raise PluginError(ErrCodes.timeout)
    return cmd_res

5.  Save and close the file, then restart the appliance management service:

service-control --restart applmgmt

6.  Test the backup.