vSAN performance service alarm 'Performance data collection' alert
search cancel

vSAN performance service alarm 'Performance data collection' alert

book

Article ID: 379519

calendar_today

Updated On:

Products

VMware vSAN VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • vSAN performance service alarm 'Performance data collection' alert is received on the vSAN Cluster.
  • Restarting the vsanmgmt service will fix the issue temporarily and will re-appear again.
  • Disabling and re-enabling the vSAN performance service will fix the issue temporarily and will re-appear again.
  • In /var/log/vsanmgmt.log the below error is frequently reported:

    2024-10-03T12:41:04.132Z Er(11)[+] vsand[6406046]: Traceback (most recent call last):
    2024-10-03T12:41:04.132Z Er(11)[+] vsand[6406046]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7570, in Run
    2024-10-03T12:41:04.132Z Er(11)[+] vsand[6406046]: sqlite3.OperationalError: no such column: throughputDevRead
    2024-10-03T12:41:04.133Z In(14) vsand[7693705]: [opID=Thread-142994 (function_runner) statsdb::ExecuteSqlAndWait] Mode: normalMode Got ExecuteSqlAndWait result...
    2024-10-03T12:41:04.133Z Er(11) vsand[7693705]: [opID=Thread-142994 (function_runner) VsanHostHelper::function_runner] Meet exception when calling executeQueryPerf
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: Traceback (most recent call last):
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/VsanPerformanceManagerImpl.py", line 832, in QueryEntityStats
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 9648, in QueryStats
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 9327, in RunProviderThread
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7699, in ExecuteSqlAndWait
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7570, in Run
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: sqlite3.OperationalError: no such column: throughputDevRead
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]:
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: During handling of the above exception, another exception occurred:
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]:
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: Traceback (most recent call last):
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/VsanHostHelper.py", line 2230, in function_runner
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/VsanPerformanceManagerImpl.py", line 2183, in executeQueryPerf
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/VsanPerformanceManagerImpl.py", line 853, in QueryEntityStats
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 9648, in QueryStats
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 9327, in RunProviderThread
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7699, in ExecuteSqlAndWait
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7570, in Run
    2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: sqlite3.OperationalError: no such column: throughputDevRead

  • In /var/log/vmware/vmware-vsan-health the below error can be seen:

    2024-10-03T17:16:35.289Z WARNING vsan-mgmt[19786] [VsanVcPerformanceManagerImpl::PerHostThreadMain opID=baeb7c94] esxnode: Node info: (vim.cluster.VsanPerfNodeInformation) [
      (vim.cluster.VsanPerfNodeInformation) {
        version = '8.0.203',
        isCmmdsMaster = true,
        isStatsMaster = true,
        vsanMasterUuid = '########-####-####-########f7a7',
        vsanNodeUuid = '########-####-####-########f7a7',
        masterInfo = (vim.cluster.VsanPerfMasterInformation) {
          secSinceLastStatsWrite = 80375,
          secSinceLastStatsCollect = 278,
          statsIntervalSec = 300,
          statsDirectoryPercentFree = 98,
          verboseMode = false,
          verboseModeLastUpdate = 2024-09-21T13:29:20Z
        },
        diagnosticMode = false
      }
    ]
    2024-10-03T17:16:35.289Z INFO vsan-mgmt[11069] [VsanVcPerformanceManagerImpl::_determineMasterAmongHosts opID=######94] Stats masters = ['esxnode'], candidates = ['esxnode']
    2024-10-03T17:16:35.289Z INFO vsan-mgmt[11069] [VsanVcPerformanceManagerImpl::__next__ opID=######94] Trying esxnode as primary
    2024-10-03T17:16:35.330Z INFO vsan-mgmt[258332] [VsanVcPerformanceManagerImpl::processException opID=######92] New candidates after exception: []
    2024-10-03T17:16:35.330Z WARNING vsan-mgmt[258332] [VsanVcPerformanceManagerImpl::processException opID=######92] Exception: (vmodl.fault.SystemError) {
      msg = "Received SOAP response fault from [<<io_obj p:0x00007f55b41e3218, h:31, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-387/vsan>]: queryVsanPerf\ndatabase disk image is malformed. DB error code: 11, please check log file for details",
      reason = "RuntimeError('database disk image is malformed. DB error code: 11, please check log file for details')"
    }
    Traceback (most recent call last):
      File "bora/vsan/perfsvc/vpxd/vpxdPyMo/VsanVcPerformanceManagerImpl.py", line 911, in _RunRegularQuery
      File "/usr/lib/vmware/site-packages/pyVmomi/VmomiSupport.py", line 618, in <lambda>
      File "/usr/lib/vmware/site-packages/pyVmomi/VmomiSupport.py", line 391, in _InvokeMethod
    PyCppVmomi.vmodl.fault.SystemError: (vmodl.fault.SystemError) {
      msg = "Received SOAP response fault from [<<io_obj p:0x00007f55b41e3218, h:31, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-387/vsan>]: queryVsanPerf\ndatabase disk image is malformed. DB error code: 11, please check log file for details",
      reason = "RuntimeError('database disk image is malformed. DB error code: 11, please check log file for details')"

Environment

VMware vSAN all versions
Aria Operations before 8.16 HF2 and 8.17 HF 1

Cause

There is a known issue in Aria Operations responsible for triggering "sqlite3.OperationalError" periodically, and the flooding "sqlite3.OperationalError" will make the DB init failures in race conditions.

Resolution

The issue has been fixed in Aria Operations 8.18.3.