Interconnect encountered a network error after OS upgrade to RHEL8

search cancel

Interconnect encountered a network error after OS upgrade to RHEL8

book

Article ID: 384999

calendar_today

Updated On:

Products

VMware Tanzu Data Suite Pivotal Data Suite Non Production Edition VMware Tanzu Data Suite Greenplum VMware Tanzu Greenplum

Issue/Introduction

Many queries are failing with:

"interconnect encountered a network error, please check your network (segXX slice1 xxx.xxx.xxx.xxx:3000 pid=xxxx)","Failed to send packet (seq 11) to xxx.xxx.xxx.xxx:3000 (pid xxxxx cid 41) after 3566 retries in 3600 seconds" error."

Environment

OS RHEL8

Resolution

Possible workaround

Set the gp_interconnect_queue_depth = to a number higher than the failed "seq" number. This should only be set on the queries with issues as it can have impact to other queries.

Fix

Expexcted to be fixed in Greenplum 6.29.0 and above.

Check Release Notes for issue 33627.

Additional Information

In GPDB interconnect communication with gp_interconnect_type=udpifc, a receiver needs to acknowledge a sender's data packet by sending back an acknowledgement packet.

Before GPDB 6.29.0, the receiver caches the sender's address/port when it receives a data packet with sequence number (seq) smaller than gp_interconnect_queue_depth. The issue occurs if the sender uses a different a port to send packets, the receiver might have already cached an old port and would continue to use that port to send acknowledgement packets. As a result, the sender never receives the acknowledgement, and the query would hang.

To avoid this issue, a fix is being prepared (will be released in GPDB 6.29.0) to let the receiver always use the most recent sender address/port to send acknowledgement packets.

Feedback

thumb_up Yes

thumb_down No