Disk full error occurring when updating ERT/EAR in the upload-tile-and-stemcell step
search cancel

Disk full error occurring when updating ERT/EAR in the upload-tile-and-stemcell step

book

Article ID: 434485

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

When attempting to update the Elastic Application Runtime tile, this error occurs in the update-tile-and-stemcell step:

{"errors":{"product":["Zip file is not valid: Archive:  /var/tempest/tmp/0000000099\n extracting: /tmp/ops_manager/d20260331-999-xxxyyy/metadata/metadata.yml  \n/tmp/ops_manager/d20260331-999-xxxyyy/metadata/metadata.yml:  write error (disk full?).  Continue? (y/n/^C) \nwarning:  /tmp/ops_manager/d20260331-999-xxxyyy/metadata/metadata.yml is probably truncated\n"]}}

Environment

VMware Tanzu Elastic Application Runtime

Cause

The Ops Manager VM had not enough disk capacity to continue with extracting the tile compressed file during the upgrade process.

Resolution

The following are a few options to work around this issue.  See the Additional Information section for more context behind these options.

  • In the Ops Manager UI, click the "Delete all unused products" button.
  • Perform a reboot of the Ops Manager VM, so that the /tmp directory is cleaned up upon restart.
  • If necessary, expand the disk size of the Ops Manager VM by following the KB "How to safely increase the disk size of the Ops Manager VM".

Additional Information

The Ops Manager VM has only one disk and its default size is 160GB.  Disk pressure can occur if there are files accumulated over time, and can impact operations such as upgrade of the tiles.

Clean up processes

  • Post-deployment stemcell cleanup: After every successful Apply Changes, Ops Manager cleans up stemcell database records that are no longer on disk and not referenced by any product. However, this does not clean up /var/tempest/releases.
  • Historical installation data pruning: A recurring job prunes old installation records from the database, but this is metadata-only and does not affect disk files.
  • There is no automatic cleanup of /var/tempest/releases after Apply Changes. Release files are only cleaned when "Delete all unused products" is explicitly invoked.
  • Ops Manager runs a scheduled job named "clean_tmp_job" every hour.  This job only targets the "/tmp/ops_manager" directory and deletes any file that is more than 24 hours old.
  • Ops Manager stores the VM's or instances' log files (collected & downloaded from the Ops Manager UI) in the "/tmp" directory.  These files collect over time and do not self-delete.  Restarting the Ops Manager VM will delete these files, and any temporary files, in the "/tmp" directory.  Also see the "Tanzu Operations Manager VM disk space" section in the "Monitoring VM stats in Tanzu Operations Manager" doc.

 

How does Ops Manager clean up unnecessary files when "Delete all unused products" function is used?

The cleanup performs three steps:

  1. Determines used vs. unused product templates – classifies templates as "used" if they are associated with any desired or actual product, plus the BOSH director template and any Kubernetes distribution templates.
  2. Deletes unused product templates – removes the metadata YAML file and product data migrations for each unused template.
  3. Deletes unused stemcells – compares all stemcells on disk against those currently assigned to or matching criteria for desired/actual products, and removes any that aren't needed.
  4. Deletes unused releases – iterates over all files in /var/tempest/releases, keeps only those whose basename matches a file entry in any used product template's releases list, and deletes the rest.

Important: this is purely a metadata-driven operation. It does NOT inspect release tarballs to determine stemcell compatibility. It relies solely on whether the release filename appears in a used product template's metadata.

 

Why are there "old" files in /var/tempest/releases that are being kept even after "Delete all unused products" was used?

The "Delete all unused products" operation only deletes release files that are not referenced by any "used" product template. It works by collecting the file attribute from the releases section of every used product template's metadata, then deleting any file on disk in /var/tempest/releases whose basename is NOT in that list.

A product template is considered "used" if it belongs to either the desired (staged) or actual (deployed) installation.

So the old jammy stemcell release files would persist if:

  1. The product template metadata for a currently staged or deployed tile still lists those release filenames in its releases array (e.g., the tile's metadata was authored to bundle releases compiled against older stemcell versions like 1.999, 1.065, etc.).
  2. Multiple tiles share the same release file – even if one tile is deleted, the release persists because another used tile still references it.

The most likely explanation: the deployed EAR tile's metadata references release files compiled for older stemcell versions. These are part of the tile's release bundle and are considered "in use" because the tile itself is still staged/deployed.

 

Expanding the disk size of the Ops Manager VM

Expanding the disk is a valid and supported solution. The existing KB article "How to safely increase the disk size of the Ops Manager VM" covers this procedure.

The OVF template ships with a disk capacity of ~160GB. For foundations with large tiles like EAR that bundle many releases, 160GB can be insufficient – especially during upgrades when both the old and new versions of a tile coexist on disk temporarily.

Expanding the disk addresses the root capacity constraint. However, the underlying issue is that release files accumulate and aren't proactively cleaned.

 

Further Recommendations

  1. Run "Delete all unused products" before uploading new tile versions – this frees space consumed by previously uploaded but unstaged tile versions.
  2. Do NOT manually delete files from /var/tempest/releases or /var/tempest/stemcells – Ops Manager maintains database records that reference these files. Manual deletion can lead to inconsistencies between the filesystem and the database, potentially causing unexpected behavior during deployments.
  3. There is no built-in mechanism to prune old stemcell-compiled releases that are still referenced by currently deployed tiles. These release files persist as long as the product template metadata references them.
  4. Consider the upgrade workflow – During tile upgrades, both the old and new tile versions temporarily coexist. The operator should ensure they complete the Apply Changes and then run "Delete all unused products" to clean up the old version before uploading the next tile. Batching multiple tile upgrades without intermediate cleanup compounds the disk pressure.
  5. Proactively increase disk to 200GB+ for environments with large tiles – In foundations with large tiles/deployments where disk pressure is present or was seen before, given that the default disk is only 160GB, expanding to 200-250GB provides adequate headroom for the upgrade lifecycle. See the KB article (referenced above) that covers the safe procedure for this.