Restore fails on usage service database with an out of memory error in Tanzu Application Service for VMs

search cancel

Restore fails on usage service database with an out of memory error in Tanzu Application Service for VMs

book

Article ID: 293555

calendar_today

Updated On: 09-05-2023

Products

Operations Manager

Issue/Introduction

Symptoms:

During TAS Restore Step 14, there was an error when restoring bbr-usage-servicedb as follows:

[bbr] 2019/07/02 16:50:14 INFO - Restoring bbr-usage-servicedb on backup_restore/4408a85f-15a1-4d31-bee4-6ae4298c2e85...
[bbr] 2019/07/02 16:53:02 ERROR - Error restoring bbr-usage-servicedb on backup_restore/4408a85f-15a1-4d31-bee4-6ae4298c2e85.

Once the script completed, the following info was returned on the screen:

[bbr] 2019/07/02 17:02:36 INFO - Finished running post-restore-unlock scripts.
1 error occurred:
error 1:
Failed to restore: 1 error occurred:
error 1:
Error attempting to run restore for job bbr-usage-servicedb on backup_restore/4408a85f-15a1-4d31-bee4-6ae4298c2e85: 2019/07/02 15:50:15 MYSQL server version 5.7.25-28-31.35-log
fatal error: runtime: out of memory

The error could also occur with BBR cli itself with error

fatal error: runtime: out of memory

runtime stack:
runtime.throw({0x901b65, 0xf0400000})
	/usr/local/go/src/runtime/panic.go:1198 +0x71
runtime.sysMap(0xc0ff400000, 0x429600, 0xc000051e90)
	/usr/local/go/src/runtime/mem_linux.go:169 +0x96
runtime.(*mheap).grow(0xcd6700, 0x780dc)
	/usr/local/go/src/runtime/mheap.go:1393 +0x225
runtime.(*mheap).allocSpan(0xcd6700, 0x780dc, 0x0, 0x1)
	/usr/local/go/src/runtime/mheap.go:1179 +0x165
runtime.(*mheap).alloc.func1()
	/usr/local/go/src/runtime/mheap.go:913 +0x69
runtime.systemstack()
	/usr/local/go/src/runtime/asm_amd64.s:383 +0x49

Environment

Cause

There are two possible causes for this issues

1. While the service usage restore process is running, it consumes all memory on the VM and eventually the process is aborted because there is not enough memory on backup_restore VM.

Service usage restore process example:

vcap     15965 15964  3 15:50 ?        00:00:03 /var/vcap/packages/database-backup-restorer/bin/database-backup-restore --restore --config /var/vcap/jobs/bbr-usage-servicedb/config/bbr-usage-servicedb.json --artifact-file /var/vcap/store/bbr-backup/bbr-usage-servicedb//usage-service-backup

Backup-restore memory consumption while the process is running (only 340k available):

backup_restore/4408a85f-15a1-4d31-bee4-6ae4298c2e85:~# free -h
              total        used        free      shared  buff/cache   available
Mem:           985M        878M         57M         28K         50M        340K

2. The second issue occurs because `bbr` cli receives the stdout from restore process and stores it in memory. Every row in the database will be passed though via SSH stdout and this stream of data is handled as one contiguous chunk instead of being buffered. This runs the risk of bbr cli itself running the local jump box out of memory.

Resolution

To fix problem one where backup-restore VM is low on memory, simply vertically scale the RAM for this VM. Switching from the default micro VM (1GB RAM) to the small VM (2GB RAM) or larger one resolves the issue.

To fix the second problem where BBR cli runs out of memory, upgrade BBR cli to version 9.1.25 or later. The fix is referenced in the release notes as followed

Make BBR more robust to large stdout streams. (#436)

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No