During TAS Restore Step 14, there was an error when restoring bbr-usage-servicedb
as follows:
[bbr] 2019/07/02 16:50:14 INFO - Restoring bbr-usage-servicedb on backup_restore/4408a85f-15a1-4d31-bee4-6ae4298c2e85... [bbr] 2019/07/02 16:53:02 ERROR - Error restoring bbr-usage-servicedb on backup_restore/4408a85f-15a1-4d31-bee4-6ae4298c2e85.
Once the script completed, the following info was returned on the screen:
[bbr] 2019/07/02 17:02:36 INFO - Finished running post-restore-unlock scripts. 1 error occurred: error 1: Failed to restore: 1 error occurred: error 1: Error attempting to run restore for job bbr-usage-servicedb on backup_restore/4408a85f-15a1-4d31-bee4-6ae4298c2e85: 2019/07/02 15:50:15 MYSQL server version 5.7.25-28-31.35-log fatal error: runtime: out of memory
The error could also occur with BBR cli itself with error
fatal error: runtime: out of memory runtime stack: runtime.throw({0x901b65, 0xf0400000}) /usr/local/go/src/runtime/panic.go:1198 +0x71 runtime.sysMap(0xc0ff400000, 0x429600, 0xc000051e90) /usr/local/go/src/runtime/mem_linux.go:169 +0x96 runtime.(*mheap).grow(0xcd6700, 0x780dc) /usr/local/go/src/runtime/mheap.go:1393 +0x225 runtime.(*mheap).allocSpan(0xcd6700, 0x780dc, 0x0, 0x1) /usr/local/go/src/runtime/mheap.go:1179 +0x165 runtime.(*mheap).alloc.func1() /usr/local/go/src/runtime/mheap.go:913 +0x69 runtime.systemstack() /usr/local/go/src/runtime/asm_amd64.s:383 +0x49
There are two possible causes for this issues
1. While the service usage restore process is running, it consumes all memory on the VM and eventually the process is aborted because there is not enough memory on backup_restore VM.
Service usage restore process example:
vcap 15965 15964 3 15:50 ? 00:00:03 /var/vcap/packages/database-backup-restorer/bin/database-backup-restore --restore --config /var/vcap/jobs/bbr-usage-servicedb/config/bbr-usage-servicedb.json --artifact-file /var/vcap/store/bbr-backup/bbr-usage-servicedb//usage-service-backup
Backup-restore memory consumption while the process is running (only 340k available):
backup_restore/4408a85f-15a1-4d31-bee4-6ae4298c2e85:~# free -h total used free shared buff/cache available Mem: 985M 878M 57M 28K 50M 340K
2. The second issue occurs because `bbr` cli receives the stdout from restore process and stores it in memory. Every row in the database will be passed though via SSH stdout and this stream of data is handled as one contiguous chunk instead of being buffered. This runs the risk of bbr cli itself running the local jump box out of memory.