The clamd process is taking significant CPU resources on partial Cloud_controller VM(s)
search cancel

The clamd process is taking significant CPU resources on partial Cloud_controller VM(s)

book

Article ID: 434028

calendar_today

Updated On:

Products

VMware Tanzu Platform - Cloud Foundry

Issue/Introduction

In current Anti-Virus for VMware Tanzu / Anti-Virus Mirror for VMware Tanzu versions (prior to v2.4.4), customers might observe an issue that the clamd process is taking significant CPU resources on Cloud_controller VM(s). 

And this issue has the following patterns:

  • Running "monit stop clamonacc" can help reduce the CPU utilization back to normal level.
  • Restarting the problematic VM does not help recover the CPU utilization.
  • Not necessarily all cloud_controller VMs will be affected by the issue. The issue might happen on partial Cloud_controller VM(s).
  • It is observed files in the 'tmp/prometheus' folder are constantly being read when running clamdtop.
    • e.g., /var/vcap/data/cloud_controller_ng/tmp/prometheus/metric_cc_acquired_db_connections_total.bin

Cause

The elevated CPU usage is caused by ClamAV scanning files under the following directory:

  • /var/vcap/data/cloud_controller_ng/tmp/prometheus 

This directory is used by the Cloud Controller Prometheus exporter and is continuously updated with metric data. Each file modification triggers an on-access scan.

Due to the frequent updates, the same files are scanned repeatedly, which results in increased CPU utilization.

This behavior is expected when on-access scanning is applied to directories with frequent file updates and is consistent with how real-time scanning operates.

Resolution

Temporary workaround:

The issue can be resolved by excluding the Prometheus temporary directory from on-access scanning:

  • OnAccessExcludePath /var/vcap/data/cloud_controller_ng/tmp/prometheus

Please add this path to the exclusion list in the Antivirus tile and redeploy so that the exclusion is persisted in clamd.conf. This ensures the change remains effective across restarts and future deployments.

  1. SSH into the problematic Cloud_controller VM.
  2. Edit /var/vcap/jobs/antivirus/clamd.conf and add the following line:
    OnAccessExcludePath /var/vcap/data/cloud_controller_ng/tmp/prometheus
  3. Run`monit restart clamd`

Permanent fix:

This issue is expected to be fixed in the future release - v2.4.4. Please contact Tanzu support team if you need further information or help.