ERROR: "artifact not found" returned from Harbor when pull, copy or delete operations are performed against an image

Products

VMware Tanzu Kubernetes Grid Integrated Edition Ops Manager

Issue/Introduction

This error presents on some but not all artifacts in the Harbor project.
The Artifacts appear in the Harbor GUI, but cannot be modified. Errors such as "artifact <PROJECT_NAME>/<IMAGE_NAME>@sha256:<HASH_ID> not found"
When attempting to run imgpkg copy commands to update the image, similar errors are reported: "NOT_FOUND: artifact <PROJECT_NAME>/<IMAGE_NAME>@sha256:<HASH_ID> not found"
When pushing images to Harbor with the imgpkg command, the client will see an error: "MANIFEST_UNKNOWN"
Harbor core.log will show messages like:

Sep 5 04:56:44 ###.###.###.### core[165337]: 2024-09-05T04:56:44Z [ERROR] [/lib/http/error.go:57]: {"errors":[{"code":"UNKNOWN","message":"unknown: ERROR: could not access status of transaction ########## (SQLSTATE 58P01)"}]}
Harbor registry.log will show messages like:

Sep 5 04:52:50 172.20.1.1 registry[165337]: time="2024-09-05T04:52:50.720720124Z" level=error msg="response completed with error" auth.user.name="harbor_registry_user" err.code="manifest unknown" err.detail="unknown manifest
Harbor postgresql.log will show messages like:

Sep 5 04:52:50 172.20.1.1 postgresql[1965337]: 2024-09-05 04:52:50.854 UTC [47] ERROR: duplicate key value violates unique constraint "unique_artifact"
Sep 5 04:52:50 172.20.1.1 postgresql[1965337]: 2024-09-05 04:52:50.854 UTC [47] DETAIL: Key (repository_id, digest)=(####, sha256:################################################################) already exists.

EXAMPLE:

Sep 5 04:52:50 172.20.1.1 postgresql[1965337]: 2024-09-05 04:52:50.854 UTC [47] DETAIL: Key (repository_id, digest)=(5134, sha256:

0f8b424aa0b96c1c388a5fd4d90735604459256336853082afb61733438872b5) already exists.

Cause

This failure condition is caused by the Harbor database files becoming corrupted. Failures in underlying storage components, or incorrect shutdown sequence of Harbor VM's might lead to this state.
The corrupted PostgreSQL database is then unable to insert records into artifact table. This may lead to a unique constraint violation with "repository_id, digest invalid" warnings reported in the artifact table.
This may lead to duplicated records in the artifact table, which will cause the push, copy, and delete operation failures for the impacted records.

Resolution

Recommendations for Harbor:

Harbor is an important system, its database should be backed up, potentially every day depending on the rate of change. The Harbor database file is located in the /data/database directory in the Harbor tile VM (when installed via Opsman). This is the directory that should be backed up.
If backup is not an option, consider using Harbor Replication to replicate images to a second Harbor instance for redundancy.
By default, the database data file and the image storage reside on the same disk. The image registry consumes disk space quicker than the database; if the disk runs out of space, it will cause database recovery failures and will prevent Harbor from starting.
It is recommended to install Harbor with a separate Container Registry storage configuration. This will keep the database data files are on the VMs disk, and the image file stored in the NFS/S3 folder. This is safer and makes recovery easier.

Steps to Identify the problem Artifact from Opsman/Bosh managed Harbor tile:

Use Bosh to SSH into the Harbor VM:

# bosh ssh into the harbor vm

$ bosh ssh -d harbor-container-registry-xxxxxx harbor-app/0
Enter sudo, then export docker command and configuration for future commands:

$ sudo -i

# alias docker="/var/vcap/packages/docker/bin/docker -H unix:///var/vcap/sys/run/docker/dockerd.sock"
Prior to any database modifications, back-up the harbor DB file:

# cd /var/vcap/store/harbor

# tar zcvf database_backup.tar.tgz database
Connect to the docker DB container, then connect to the psql registry schema:

# docker exec -it harbor-db bash
$ psql -U postgres -d registry
Query the database for the entry reported in the error messaging (this example command uses the repository_id, digest reported in the errors above):

Example: 5134, sha256:0f8b424aa0b96c1c388a5fd4d90735604459256336853082afb61733438872b5

$ select id, digest from artifact where repository_id=5134 and digest = 'sha256:0f8b424aa0b96c1c388a5fd4d90735604459256336853082afb61733438872b5';
Use the below SELECT query to identify duplicated rows:

SELECT repository_id, digest, COUNT(*)

FROM artifact

GROUP BY repository_id, digest

HAVING COUNT(*) > 1;

The above process will provide steps to identify the duplicated rows. These can be deleted after backup to correct the failing condition. Please contact VMware by Broadcom support for assistance with DB modification if required.