Why Does Truncate Take So Long and Show Numerous unlink() and open() Operations in strace?

search cancel

Why Does Truncate Take So Long and Show Numerous unlink() and open() Operations in strace?

book

Article ID: 383787

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Customers may observe a significant number of open() and unlink() operations through the strace command when performing truncate operations on the database. Additionally, most of these system calls may return ENOENT errors ("No such file or directory"). This behavior might lead customers to question why the database exhibits such behavior and whether it would be more efficient to directly delete the files instead of generating numerous "No such file or directory" errors via strace.

strace: Process 26605 attached with 2 threads
[pid 26610] 07:38:16.589647 restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 26605] 07:38:16.595774 unlink("base/16384/xxxxxxx.19") = -1 ENOENT (No such file or directory) <0.000876>
[pid 26605] 07:38:16.606202 unlink("base/16384/xxxxxxx.20") = -1 ENOENT (No such file or directory) <0.000733>
[pid 26605] 07:38:16.607012 unlink("base/16384/xxxxxx.21") = -1 ENOENT (No such file or directory) <0.000923>
[pid 26605] 07:38:16.607997 unlink("base/16384/xxxxxxx.22") = -1 ENOENT (No such file or directory) <0.000821>
[pid 26605] 07:38:16.608887 unlink("base/16384/xxxxxxx.23") = -1 ENOENT (No such file or directory) <0.000562>
[pid 26605] 07:38:16.609509 unlink("base/16384/xxxxxxx.24") = -1 ENOENT (No such file or directory) <0.000580>
[pid 26605] 07:38:16.610145 unlink("base/16384/xxxxxxxx.25") = -1 ENOENT (No such file or directory) <0.000622>

Also, customers might see a lot of audit-related processes taking a lot of CPU

Cause

At the lowest storage layer (such as the md layer for heap storage), we don't know which segment files actually exist, even for append-optimized tables.
For heap tables, both PostgreSQL (even version 18) and Greenplum Database attempt to unlink segment files until the first missing segment file is found. Thus, if a table has 16384.1, 16384.2, 16384.3, the heap code will issue 4 unlink operations before finishing.
We similarly rely on unlink operations to handle AO (Append-Optimized) tables.
However, unlike heaps, for AO tables, we are forced to call unlink operations up to 127 times because there may be gaps in the segment file number sequence (e.g., a table might have 16384.1, 16384.2, 16384.4). Therefore, the first file found to be missing in the sequence (like 16384.3) is not necessarily the last one. As a result, we can't exit early and must exhaust all possibilities.
Unlike heaps, we could refer to directories (aoseg and aocsseg) to know which files exist on disk, stat them, and then unlink them. However, it's unclear whether starting before unlinking would be faster - it seems unlikely as it involves two system calls instead of one. Moreover, there is no system call that can efficiently list the contents of the current tablespace (which contains thousands of relation files from other tables) and efficiently search for the segment files we want at the moment. So, it is normal that when we perform the truncate operation, it shows a lot of unlink() and open() calls.

Resolution

If we observe that numerous audit system calls in the process list are significantly impacting the database performance, we can temporarily disable the audit process to assess whether it improves performance. Here's a refined version:

sudo auditctl -e 0

To disable it permanently :

sudo systemctl disable auditd
sudo systemctl stop auditd

Feedback

thumb_up Yes

thumb_down No