Queries that involve the use of sequences (ie: Serial data type) retrieve the next value from the sequence server on the master. This is an example of the seqserver
process on the master:
master/5f4832eb-c553-45a0-9bf2-0cd472864291:~$ ps -ef | grep seqserver | grep -v grep gpadmin 20735 20728 0 Jan22 ? 00:00:17 postgres: 5432, seqserver process
When the nextval()
function is called on the segment, it will request the value from the master seqserver
process. In some cases after the GPDB database has been up for some time, any query involving the fetching of sequence numbers will start to hang and eventually fail with:
2020-01-10 14:07:20.305583 EST,"gpadmin","dedw",p116568,th1565489088,"[local]",,2020-01-10 14:05:33 EST,0,con994228,cmd2,seg-1,,dx3440738,,sx1,"ERROR","XX000","Error: Could not read() message from seqserver.(error:104). Closing connection. (seqserver.c:290) (entry db xxx.xx.xx.xx:5432 pid=116577) (cdbdisp.c:254)",,,,,,"Insert Into test2 (clmn2) Select Now() From generate_series(1, 1000);",0,,"cdbdisp.c",254,"Stack trace: 1 0x9620eb postgres errstart (elog.c:521) 2 0x9eaee0 postgres cdbdisp_finishCommand (cdbdisp.c:252) 3 0x6b1d08 postgres mppExecutorFinishup (execUtils.c:2095) 4 0x698412 postgres ExecutorEnd (discriminator 2) 5 0x8402e5 postgres <symbol not found> (pquery.c:311) 6 0x8406e8 postgres <symbol not found> (pquery.c:1479) 7 0x84197c postgres PortalRun (pquery.c:1033) 8 0x83c366 postgres <symbol not found> (postgres.c:1783) 9 0x83e571 postgres PostgresMain (postgres.c:4962) 10 0x7ddc95 postgres <symbol not found> (postmaster.c:6406) 11 0x7e046a postgres PostmasterMain (postmaster.c:1533) 12 0x4cb797 postgres main (main.c:206) 13 0x7f025869e495 libc.so.6 __libc_start_main + 0xf5 14 0x4cbd4c postgres <symbol not found> + 0x4cbd4c
There was a previous workaround that was suggested which indicated that high load can cause this issue, so increasing the cache_value
would limit the number of requests sent to the master: https://knowledge.broadcom.com/external/article?articleId=296163
However, even in these scenarios when the cache value was set to a very high value you may still see the same error even after restarting your database. The other workaround is to restart the database every time this happens, however the issue can return after just a few days of use, especially if it is a high load system.
When doing a TCPDUMP
between the master and the segment for a hung query, you see the following:
1 12:33:19.015125 IP (tos 0x0, ttl 64, id 35753, offset 0, flags [DF], proto TCP (6), length 60) 172.28.60.19.12087 > 172.28.60.19.27360: Flags [S], cksum 0xd08d (incorrect -> 0x0c99), seq 653718187, win 43690, options [mss 65495,sackOK,TS val 2575137026 ecr 0,nop,wscale 7], length 0 2 12:33:19.015138 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 172.28.60.19.27360 > 172.28.60.19.12087: Flags [S.], cksum 0xd08d (incorrect -> 0x88fe), seq 3705181232, ack 653718188, win 43690, options [mss 65495,sackOK,TS val 2575137026 ecr 2575137026,nop,wscale 7], length 0 3 12:33:19.015147 IP (tos 0x0, ttl 64, id 35754, offset 0, flags [DF], proto TCP (6), length 52) 172.28.60.19.12087 > 172.28.60.19.27360: Flags [.], cksum 0xd085 (incorrect -> 0x5b43), ack 1, win 342, options [nop,nop,TS val 2575137026 ecr 2575137026], length 0 4 12:33:19.015182 IP (tos 0x0, ttl 64, id 35755, offset 0, flags [DF], proto TCP (6), length 80)
In steps 1 - 4, the segment initiates a connection with the master which is acknowledged. After the handshake is complete, the segment starts it's stream and sends it's first 28 bytes to the seqserver
.
5 12:33:19.216029 IP (tos 0x0, ttl 64, id 35756, offset 0, flags [DF], proto TCP (6), length 80) 172.28.60.19.12087 > 172.28.60.19.27360: Flags [P.], cksum 0xd0a1 (incorrect -> 0x1867), seq 1:29, ack 1, win 342, options [nop,nop,TS val 2575137227 ecr 2575137026], length 28 6 12:33:19.421023 IP (tos 0x0, ttl 64, id 35757, offset 0, flags [DF], proto TCP (6), length 80) 172.28.60.19.12087 > 172.28.60.19.27360: Flags [P.], cksum 0xd0a1 (incorrect -> 0x179a), seq 1:29, ack 1, win 342, options [nop,nop,TS val 2575137432 ecr 2575137026], length 28 7 12:33:19.832024 IP (tos 0x0, ttl 64, id 35758, offset 0, flags [DF], proto TCP (6), length 80) 172.28.60.19.12087 > 172.28.60.19.27360: Flags [P.], cksum 0xd0a1 (incorrect -> 0x15ff), seq 1:29, ack 1, win 342, options [nop,nop,TS val 2575137843 ecr 2575137026], length 28 8 12:33:20.017031 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 172.28.60.19.27360 > 172.28.60.19.12087: Flags [S.], cksum 0xd08d (incorrect -> 0x81e3), seq 3705181232, ack 653718188, win 43690, options [mss 65495,sackOK,TS val 2575138028 ecr 2575137843,nop,wscale 7], length 0 9 12:33:20.017042 IP (tos 0x0, ttl 64, id 35759, offset 0, flags [DF], proto TCP (6), length 52) 172.28.60.19.12087 > 172.28.60.19.27360: Flags [.], cksum 0xd085 (incorrect -> 0x573d), ack 1, win 342, options [nop,nop,TS val 2575138028 ecr 2575137026], length 0
However, steps 5 - 7 show that there is continually no response from the seqserver
and the pattern eventually continues until it gives up and sends a reset packet:
25 12:35:04.229047 IP (tos 0x0, ttl 64, id 60168, offset 0, flags [DF], proto TCP (6), length 40) 172.28.60.19.27360 > 172.28.60.19.12087: Flags [R], cksum 0xd45f (correct), seq 3705181233, win 0, length 0
In this case, it appears the OS is not handling the connection correctly. Doing a netstat -s
also shows the following during the time the query is running:
2283 times the listen queue of a socket overflowed 2359 SYNs to LISTEN sockets dropped
Next, the query gives up:
2331 times the listen queue of a socket overflowed 2407 SYNs to LISTEN sockets dropped
This supports the view that these ack
packets are being silently dropped due to the TCP queue filing up.
This OS parameter is being controlled by this OS parameter (https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt):
somaxconn - INTEGER Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to 4096. (Was 128 before linux-5.4) See also tcp_max_syn_backlog for additional tuning for TCP sockets.
However, in our code for the seqserver
, we defined another value that overrides this kernel parameter. It takes the lower of the two values, which can limit the TCP queue especially for high load systems and will cause the seqserver
to hang.
The value has been adjusted and the fix will be included in Greenplum 5.25.0. If you have a high load system and are seeing this issue popping up from time to time, it would be good to upgrade to Greenplum 5.25.0 to prevent it from occurring again.
If you still have issues after Greenplum 5.25.0, then you will have to adjust the following parameters:
gp_interconnect_tcp_listener_backlog
(GPDB GUC)net.core.somaxconn
(Kernel Parameter)
The above two parameter would need to be increased from their defaults. If you want to guarantee a high enough value, it will need to be set to: (gp_num_contents_in_cluster
) * (max_connections
)