Error "ERRORDATA_STACK_SIZE exceeded" causes the segment to failover
search cancel

Error "ERRORDATA_STACK_SIZE exceeded" causes the segment to failover

book

Article ID: 295610

calendar_today

Updated On:

Products

VMware Tanzu Greenplum Pivotal Data Suite Non Production Edition VMware Tanzu Data Suite VMware Tanzu Data Suite Greenplum

Issue/Introduction

There is a crash in a segment and PANIC with the string "ERRORDATA_STACK_SIZE exceeded".

Segment Log:

2024-12-03 17:43:22.985215 CET,"user01","db01",p3205640,th-350088576,"192.168.10.1","20650",2024-12-03 16:37:09 CET,0,con153826,cmd1992,seg23,slice10,,,sx1,"PANIC","XX000","ERRORDATA_STACK_SIZE exceeded (elog.c:1679)",,,,,,,0,,"elog.c",1679,"Stack trace:
1    0xbdee4f postgres errstart (elog.c:567)
2    0xbe19b9 postgres elog_start (elog.c:1679)
3    0x759474 postgres AbortCurrentTransaction (xact.c:3870)
4    0xa94d4c postgres PostgresMain (postgres.c:5061)
5    0xa1e0ed postgres <symbol not found> (postmaster.c:2861)
6    0xa1f095 postgres PostmasterMain (discriminator 5)
7    0x6e063b postgres main (main.c:178)
8    0x7fbfe7a60d85 libc.so.6 __libc_start_main + 0xe5
9    0x6ec41e postgres _start + 0x2e
"

The segment may continue to log messages after the panic and not enter recovery mode.

Cause

For icproxy/ictcp, the TeardownTCPInterconnect() function is called both when a query has finished execution normally, and/or when the query is aborting.

If another ERROR happens in TeardownTCPInterconnect() during aborting, then TeardownTCPInterconnect() is called once again. This ERROR loop could go on forever until ERRORDATA_STACK_SIZE is reached and a PANIC is thrown.

Resolution

A code fix has been developed and is pending release as of Dec 2024.

Expected to be released in 6.28.1 and/or 6.29.0. Check release notes for issue 33612.