Health Monitoring for VCF fails with error "operation is in progress for Id '########-####-####-####-############' Wait for the operation to complete"
search cancel

Health Monitoring for VCF fails with error "operation is in progress for Id '########-####-####-####-############' Wait for the operation to complete"

book

Article ID: 395902

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

  • HRM script fails with following error:

[YYYY-MM-DD hh:mm:ss] INFO - [send-data-to-vrops.py:129] - Fetching data from SOS Utility on SDDC Manager...
[YYYY-MM-DD hh:mm:ss] INFO - [send-data-to-vrops.py:130] - This can take 15~90 min (or even more) depending on the size of your environment. Please wait....
[YYYY-MM-DD hh:mm:ss] INFO - [SosRest.py:144] - {'errorCode': 'OPERATION_IN_PROGRESS', 'message': 'Operation is in progress for Id ########-####-####-####-############. Wait for the operation to complete.'}
[YYYY-MM-DD hh:mm:ss] INFO - [SosRest.py:144] - status code 409

  • Log file /var/log/messages from SDDC-M will show entries similar to:

    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] [Health-Summary-API] SDDC Health Summary operation UUID: ########-####-####-####-############
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Setting cache for Health-Check operation for SDDC
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Running on Python Version : 3.10.11
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Welcome to Supportability and Serviceability(SoS) utility!
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Run Initiated on Saturday, <Month> <Date>, <Year> <Time>
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] SOS Version: 5.1.1.0-23480821
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Initiating Task registration..
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Performing Health-Check operation for SDDC
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Insert Task ID ########-####-####-####-############ to DB
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Inserting record for table <class 'framework.dbinterface.database.Tasks'> with attributes {'id': '########-####-####-####-############', 'creationTimestamp': 'YYYY-MM-DDThh:mm:ss', 'description': 'Health-Check operation for SDDC', 'status': 'Pending', 'path': '/var/log/vmware/vcf/sddc-support', 'bundleType': 'HEALTH_CHECK', 'bundleName': 'healthcheck-<Time-stamp>-1946'}
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Committing session
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] [Health-Summary-API] Response:  {'id': '########-####-####-####-############', 'description': 'Health-Check operation for SDDC'}
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] {'username': '<Username@domain>', 'clientIP': '<IP Address>', ' remoteIP': '127.0.0.1', 'userAgent': 'python-requests/2.28.1', 'api': '/v1/system/health-summary', 'httpMethod': 'POST', 'operation': 'Initiates Health Summary checks.', 'httpStatus': 202, 'timestamp': 'YYYY-MM-DDThh:mm:ss'}
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [ERROR] Traceback (most recent call last):
    .
    .
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]:     return self._exec_single_context(
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]:   File "/opt/vmware/sddc-support/services/../framework/../dependency/sqlalchemy/engine/base.py", line 1987, in _exec_single_context
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]:     self._handle_dbapi_exception(
    YYYY-MM-DDThh:mm:ss` <SDDC short-name> sosrest[1946]:   File "/opt/vmware/sddc-support/services/../framework/../dependency/sqlalchemy/engine/base.py", line 2344, in _handle_dbapi_exception
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]:     raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]:     cursor.execute(statement, parameters)
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) database disk image is malformed
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: [SQL: INSERT INTO tasks (id, description, "bundleType", "creationTimestamp", "completionTimestamp", "bundleName", path, status, "bundleAvailable") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)]
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: [parameters: ('########-####-####-####-############', 'Health-Check operation for SDDC', 'HEALTH_CHECK', 'YYYY-MM-DDThh:mm:ss.', None, 'healthcheck-<Time-stamp>-1946', '/var/log/vmware/vcf/sddc-support', 'Pending', None)]
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: (Background on this error at: https://sqlalche.me/e/20/4xp6)
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [ERROR] DB commit failed
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [INFO] Closing session
    YYYY-MM-DDThh:mm:ss <SDDC short-name> sosrest[1946]: YYYY-MM-DD hh:mm:ss [1946] [ERROR] DB operation failed with DatabaseError('(sqlite3.DatabaseError) database disk image is malformed')

Environment

VMware Cloud Foundation 4.x
VMware Cloud Foundation 5.x

Cause

Potential causes:

  • A disk partition is filling up on the SDDC appliance or a storage event.
  • The sos service was stopped/restarted during an sos operation.

Resolution

  1. Take a snapshot of the SDDC Manager.
  2. SSH into the SDDC Manager Appliance with the vcf user and then switch to root with su.
  3. Stop the sosrest service:
    # systemctl stop sosrest.service
  4. Move the corrupted files out of the sddc-support directory:
    # mv /opt/vmware/vcf/sddc-support/soservice.db /var/tmp
    # mv /opt/vmware/vcf/sddc-support/status.json /var/tmp
    # mv /opt/vmware/vcf/sddc-support/.status-tmp.json /var/tmp
  5. Connect to platform database:
    # psql -h localhost -U postgres -d platform

    Note: For VCF 5.1.x, Refer KB: 371622

  6. Remove sos service tasks from task_metadata table:
    delete from task_metadata where task_url like '%sos%';
    delete from task_metadata where task_type='SDDCMANAGER_BACKUP';
  7. Exit psql and restart the sosrest service:
    # systemctl start sosrest.service
  8. Restart SDDC services:
    # /opt/vmware/vcf/operationsmanager/scripts/cli/sddcmanager_restart_services.sh