High CPU utilization and transient file path errors on Diego cells during Scheduled ClamAV scans
search cancel

High CPU utilization and transient file path errors on Diego cells during Scheduled ClamAV scans

book

Article ID: 441120

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

A critical alert (CPU Load >90% for 10+ minutes) triggered on an single Diego Cell.

The clamscan.log file recorded hundreds of filesystem errors similar to the following:

/var/vcap/data/grootfs/store/unprivileged/images/.../diff/home/vcap/app/BOOT-INF/lib/...: File path check failure: No such file or directory. ERROR
/var/vcap/data/grootfs/store/unprivileged/images/.../diff/home/vcap/app/BOOT-INF/lib/...: Can't open file or directory ERROR

Environment

 

  • Product: Tanzu Application Service (TAS) / Operations Manager

  • Impacted Service: Container Filesystem (GrootFS)

  • Security Software: Anti-Virus for VMware Tanzu (ClamAV)

 

Cause

A condition occurs between the static scheduled scan of ClamAV and the dynamic lifecycle of application containers managed by GrootFS.

When ClamAV runs a scheduled scan, it indexes and attempts to inspect files inside the ephemeral container filesystem layers located under /var/vcap/data/grootfs/store/unprivileged/images/. Because container instances on Diego cells are frequently spun up, rescaled, or garbage-collected by the platform, these filesystem layers can be torn down while ClamAV is actively attempting to read them. This causes ClamAV to repeatedly fail to open files that existed moments prior, resulting in an automated logging loop of non-fatal path errors, filesystem crawling overhead, and a spike in CPU utilization that can last long enough to throw a high CPU error.

Resolution

To eliminate the CPU performance bottleneck, configure ClamAV to exclude the dynamic GrootFS store directory. Container images should instead be scanned upstream during the pipeline build phase or via secure base buildpacks.

Permanent fix via Operations Manager UI

  1. Log in to the Ops Manager Installation Dashboard.

  2. Click on the Anti-Virus for VMware Tanzu (Compliance Scanner) tile.

  3. Navigate to the Scan Configuration section.

  4. Locate the Exclusions / Paths to Exclude field (Ignore directories setting).

  5. Add the following path to the exclusion list: /var/vcap/data/grootfs/

  6. Click Save.

  7. Return to the Ops Manager Dashboard, click Review Pending Changes, ensure the Anti-Virus tile is selected, and click Apply Changes to roll out the updated configuration across all Diego Cells.

Additional Information

This issue manifests as short-lived or sustained peaks that typically auto-resolve once the file system traversal completes or the scan finishes, meaning bosh recreate or manual process remediation is rarely required unless the cell becomes totally unresponsive.