Many queries are failing with:
"interconnect encountered a network error, please check your network (segXX slice1 xxx.xxx.xxx.xxx:3000 pid=xxxx)","Failed to send packet (seq 11) to xxx.xxx.xxx.xxx:3000 (pid xxxxx cid 41) after 3566 retries in 3600 seconds" error."
OS RHEL8
Set the gp_interconnect_queue_depth = to a number higher than the failed "seq" number. This should only be set on the queries with issues as it can have impact to other queries.
Expexcted to be fixed in Greenplum 6.29.0 and above.
Check Release Notes for issue 33627.
In GPDB interconnect communication with gp_interconnect_type=udpifc, a receiver needs to acknowledge a sender's data packet by sending back an acknowledgement packet.
Before GPDB 6.29.0, the receiver caches the sender's address/port when it receives a data packet with sequence number (seq) smaller than gp_interconnect_queue_depth. The issue occurs if the sender uses a different a port to send packets, the receiver might have already cached an old port and would continue to use that port to send acknowledgement packets. As a result, the sender never receives the acknowledgement, and the query would hang.
To avoid this issue, a fix is being prepared (will be released in GPDB 6.29.0) to let the receiver always use the most recent sender address/port to send acknowledgement packets.