Aria Orchestrator patch 5 upgrade failing

Products

VCF Automation

Issue/Introduction

Upgrade for Aria Automation Orchestrator from version 8.18.1 (24977824) to 8.18.1 (25193536) fails

Aria Automation Orchestrator appliance journal logs show:

Mar 18 06:09:58 hostname[1903]: panic: freepages: failed to get all reachable pages (page 636: multiple references (stack: [672 669 636]))
Mar 18 06:09:58 hostname[1903]: goroutine 15 [running]:
Mar 18 06:09:58 hostname[1903]: go.etcd.io/bbolt.(*DB).freepages.func2()
Mar 18 06:09:58 hostname[1903]: /go/pkg/mod/go.etcd.io/[email protected]/db.go:1202 +0x8d
Mar 18 06:09:58 hostname[1903]: created by go.etcd.io/bbolt.(*DB).freepages in goroutine 55
Mar 18 06:09:58 hostname[1903]: /go/pkg/mod/go.etcd.io/[email protected]/db.go:1200 +0x1c5
Mar 18 06:09:58 hostname systemd[1]: containerd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

Appliance journal logs can be reviewed with command journalctl -xe or historical logs are stored in /services-logs/journal/systemd.journal-YYYYMMDD

Environment

Aria Automation Orchestrator 8.18.1

Cause

During startup, containerd opens its bbolt database and runs an integrity check on the B+ tree page structure. The check found that page 636 is referenced multiple times in the page tree (by pages 672, 669, and itself) — this should never happen in a healthy database. Because the page reference graph is inconsistent, bbolt panics rather than risk further data corruption.

The most common reasons for this corruption:

Unclean shutdown / power loss — If the system lost power or was hard-reset while containerd was writing to the database, a partial write could leave the B+ tree in an inconsistent state. BoltDB uses mmap and relies on the OS flushing pages; an interruption at the wrong moment can corrupt the freelist or internal page pointers.
Disk I/O errors — Bad sectors, failing storage, or flaky storage controllers can silently corrupt pages on disk.
Filesystem corruption — If the underlying filesystem (ext4, xfs, etc.) itself has inconsistencies, the database file can be damaged.
Known bug in bbolt v1.3.x freelist handling — There were several bugs in bbolt's freelist management in the 1.3.x line. These were tracked in issues like etcd-io/bbolt#707 and related PRs. Some of these bugs could cause page double-frees or multiple references even without external I/O failures.

Resolution

Recovery steps followed on the source 8.18.1 pre-upgrade state:

Stop containerd (it may be crashed):
```
systemctl stop containerd
```

Locate the corrupted database — typically at

/var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db

Back up the corrupted file containing folder:

mv /var/lib/containerd/io.containerd.metadata.v1.bolt /var/lib/containerd/io.containerd.metadata.v1.bolt.corrupted

Start containerd
```
systemctl start containerd
```
Restart services with command
```
/opt/scripts/deploy.sh
```
Proceed the upgrade using known instructions in article