++vSAN performance service alarm 'Performance data collection' alert is received on the vSAN Cluster
++restarting the vsanmgmt service will fix the issue temporarily and will re-appear again
++disabling and re-enabling the vsan performance service will fix the issue temporarily and will re-appear again
VMware vSAN all versions
Aria operations before 8.16 HF2 and 8.17 HF 1
vsan-health-service.log --- >reports "queryVsanPerf\ndatabase disk image is malformed. DB error code: 11"
]
2024-10-03T17:16:35.289Z WARNING vsan-mgmt[19786] [VsanVcPerformanceManagerImpl::PerHostThreadMain opID=baeb7c94] esxnode: Node info: (vim.cluster.VsanPerfNodeInformation) [
(vim.cluster.VsanPerfNodeInformation) {
version = '8.0.203',
isCmmdsMaster = true,
isStatsMaster = true,
vsanMasterUuid = '639a2a21-7a71-8626-f7a7-############',
vsanNodeUuid = '639a2a21-7a71-8626-f7a7-############',
masterInfo = (vim.cluster.VsanPerfMasterInformation) {
secSinceLastStatsWrite = 80375,
secSinceLastStatsCollect = 278,
statsIntervalSec = 300,
statsDirectoryPercentFree = 98,
verboseMode = false,
verboseModeLastUpdate = 2024-09-21T13:29:20Z
},
diagnosticMode = false
}
]
2024-10-03T17:16:35.289Z INFO vsan-mgmt[11069] [VsanVcPerformanceManagerImpl::_determineMasterAmongHosts opID=baeb7c94] Stats masters = ['esxnode'], candidates = ['esxnode']
2024-10-03T17:16:35.289Z INFO vsan-mgmt[11069] [VsanVcPerformanceManagerImpl::__next__ opID=baeb7c94] Trying esxnode as primary
2024-10-03T17:16:35.330Z INFO vsan-mgmt[258332] [VsanVcPerformanceManagerImpl::processException opID=baeb7c92] New candidates after exception: []
2024-10-03T17:16:35.330Z WARNING vsan-mgmt[258332] [VsanVcPerformanceManagerImpl::processException opID=baeb7c92] Exception: (vmodl.fault.SystemError) {
msg = "Received SOAP response fault from [<<io_obj p:0x00007f55b41e3218, h:31, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-387/vsan>]: queryVsanPerf\ndatabase disk image is malformed. DB error code: 11, please check log file for details",
reason = "RuntimeError('database disk image is malformed. DB error code: 11, please check log file for details')"
}
Traceback (most recent call last):
File "bora/vsan/perfsvc/vpxd/vpxdPyMo/VsanVcPerformanceManagerImpl.py", line 911, in _RunRegularQuery
File "/usr/lib/vmware/site-packages/pyVmomi/VmomiSupport.py", line 618, in <lambda>
File "/usr/lib/vmware/site-packages/pyVmomi/VmomiSupport.py", line 391, in _InvokeMethod
PyCppVmomi.vmodl.fault.SystemError: (vmodl.fault.SystemError) {
msg = "Received SOAP response fault from [<<io_obj p:0x00007f55b41e3218, h:31, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-387/vsan>]: queryVsanPerf\ndatabase disk image is malformed. DB error code: 11, please check log file for details",
reason = "RuntimeError('database disk image is malformed. DB error code: 11, please check log file for details')"
++We can see that "no such column: throughputDevRead" is flooding.
2024-10-03T12:41:04.132Z Er(11)[+] vsand[6406046]: Traceback (most recent call last):
2024-10-03T12:41:04.132Z Er(11)[+] vsand[6406046]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7570, in Run
2024-10-03T12:41:04.132Z Er(11)[+] vsand[6406046]: sqlite3.OperationalError: no such column: throughputDevRead
2024-10-03T12:41:04.133Z In(14) vsand[7693705]: [opID=Thread-142994 (function_runner) statsdb::ExecuteSqlAndWait] Mode: normalMode Got ExecuteSqlAndWait result...
2024-10-03T12:41:04.133Z Er(11) vsand[7693705]: [opID=Thread-142994 (function_runner) VsanHostHelper::function_runner] Meet exception when calling executeQueryPerf
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: Traceback (most recent call last):
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/VsanPerformanceManagerImpl.py", line 832, in QueryEntityStats
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 9648, in QueryStats
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 9327, in RunProviderThread
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7699, in ExecuteSqlAndWait
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7570, in Run
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: sqlite3.OperationalError: no such column: throughputDevRead
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]:
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: During handling of the above exception, another exception occurred:
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]:
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: Traceback (most recent call last):
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/VsanHostHelper.py", line 2230, in function_runner
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/VsanPerformanceManagerImpl.py", line 2183, in executeQueryPerf
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/VsanPerformanceManagerImpl.py", line 853, in QueryEntityStats
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 9648, in QueryStats
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 9327, in RunProviderThread
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7699, in ExecuteSqlAndWait
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: File "/usr/lib/vmware/vsan/perfsvc/statsdb.py", line 7570, in Run
2024-10-03T12:41:04.133Z Er(11)[+] vsand[7693705]: sqlite3.OperationalError: no such column: throughputDevRead
++This error indicates that the vROps is configured to the cluster, and there is a known issue in vROps, this issue will trigger "sqlite3.OperationalError" periodically, and the flooding "sqlite3.OperationalError" will make the DB init failures in race conditions.
This was an issue we had identified in the 8.16 build of vROps and pushed a subsequent fix in the 8.16 HF 2 and 8.17 HF1 builds.
Here are the details of the build and the KB article -
VMware Aria Operations 8.17 Hot Fix 1
Build Number: 23708527
VMware Aria Operations 8.16 Hot Fix 2
Kindly upgrade to the necessary HF build version and the issue should be resolved.