All the graphs under Monitor Edge screen were unable to display the data. Graphs like Links, Applications, QOE and other graphs were impacted.
VMware SDWAN Orchestrator
There was very high CPU usage around 100% and due to this, the stats were not getting processed and hence they were getting queued up. Since all the stats data are stored in clickhouse database, so the graphs under Monitor Edge screen were impacted.
mysql> select type,state,count(*) from VELOCLOUD_FILE_PROCESSING_QUEUE group by type, state;
+-----------------------+----------+----------+
| type | state | count(*) |
+-----------------------+----------+----------+
| FLOWSTATS | ENQUEUED | 109252 |
| GATEWAYINTERFACESTATS | ENQUEUED | 6 |
| HEALTHSTATS | ENQUEUED | 80 |
| LINKQOE | ENQUEUED | 164 |
| LINKSTATS | ENQUEUED | 71 |
| PATHSTATS | ENQUEUED | 2 |
| ROUTING_EVENTS | ENQUEUED | 5 |
| CLICKHOUSE_INSERTS | STARTED | 24 |
| FLOWSTATS | STARTED | 886 |
| GATEWAYINTERFACESTATS | STARTED | 19 |
| HEALTHSTATS | STARTED | 1975 |
| LINKQOE | STARTED | 1536 |
| LINKSTATS | STARTED | 2381 |
| PATHSTATS | STARTED | 444 |
| ROUTING_EVENTS | STARTED | 21 |
+-----------------------+----------+----------+
15 rows in set (0.44 sec)
mysql> select count(*),type from VELOCLOUD_FILE_PROCESSING_QUEUE;
+----------+--------------------+
| count(*) | type |
+----------+--------------------+
| 116890 | CLICKHOUSE_INSERTS |
+----------+--------------------+
1 row in set (0.04 sec)
mysql>
root@lkt-vlo-vco1:/var/lib/velocloud/file_store# ls -l | wc -l
117162
root@lkt-vlo-vco1:/var/lib/velocloud/file_store#
The clickhouse process was stuck since clickhouse simultaneous query limit was reached.
2024.04.26 02:54:31.036033 [ 27705 ] {*************} <Error> DynamicQueryHandler: Code: 202. DB::Exception: Too many simultaneous queries. Maximum: 100. (TOO_MANY_SIMULTANEOUS_QUERIES), Stack trace (when copying this message, always include the lines below):2024.04.26 02:54:31.049187 [ 27703 ] {***********} <Error> DynamicQueryHandler: Code: 202. DB::Exception: Too many simultaneous queries. Maximum: 100. (TOO_MANY_SIMULTANEOUS_QUERIES)
We can leverage this command to understand which user is creating simultaneous queries.
@vco129-usvi1:/var/log/portal$ zgrep '"username"' velocloud.log-20240723-1721715421 | awk -F'"username":"' '{print $2}' | awk -F'"' '{print $1}' | sort | uniq -c | sort -nr | head -10
Below is the workaround to manually change the amount of connections allowed, we need to log in as root and modify a line from the file: /etc/clickhouse-server/config.xml
Change: <max_concurrent_queries>100</max_concurrent_queries>
To: <max_concurrent_queries>300</max_concurrent_queries>
After making that change we can restart the click-house service like this:
root@vco:~# systemctl restart clickhouse-server.service
To view the status of the service:
root@vco:~# systemctl restart clickhouse-server.service
Above changes need to done in VCO CLI using a root user