Excessive SWAP usage leading to CPU saturation in-turn database HUNG or GPSTOP stuck
search cancel

Excessive SWAP usage leading to CPU saturation in-turn database HUNG or GPSTOP stuck

book

Article ID: 371209

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Here are few symptoms to check the nature of the issue -

+ Queries are not progressing or even cancelling from End-User perspective.

+ Database connection are not successfully establishing. 

+ gpstop not progressing at all.

+ gpssh to all or few segment hosts not responding.

+ Load it excessively high on all segment hosts.

Environment

All Greenplum versions.

Cause

Congestion and Slowness concluded as side effect of CPU usage towards excessive SWAP usage

 

+ Stuck spinlocks shown in logs:

2024-06-13 21:30:35.569294 UTC,"read_only","coredw",p28261,th-1391740800,"xx.xx.0.11","40302",2024-06-13 20:30:54 UTC,0,con368574,cmd44,seg224,slice239,,,sx1,"PANIC","XX000","stuck spinlock (0x7f55981a300c) detected at instrument.c:398 (s_lock.c:42)",,,,,,,0,,"s_lock.c",42,"Stack trace:1    0xc015e7 postgres errstart (elog.c:557)2    0xc0447e postgres elog_finish (elog.c:1728)3    0xa8787e postgres <symbol not found> (s_lock.c:41)4    0x8dd972 postgres <symbol not found> (discriminator 1)5    0xc420e2 postgres <symbol not found> (discriminator 3)6    0xc41ff0 postgres <symbol not found> (discriminator 3)7    0xc427be postgres ResourceOwnerRelease (discriminator 2)8    0x732b8c postgres <symbol not found> (xact.c:3365)9    0x735455 postgres AbortCurrentTransaction (xact.c:3982)10   0xa993b0 postgres PostgresMain (postgres.c:5069)11   0x6b3553 postgres <symbol not found> (postmaster.c:4492)12   0xa1ecb6 postgres PostmasterMain (postmaster.c:1517)13   0x6b7431 postgres main (main.c:205)14   0x7f55a9a67555 libc.so.6 __libc_start_main + 0xf515   0x6c32ac postgres <symbol not found> + 0x6c32ac

 

 
+ CPU saturation with high system user & IO Wait usage 

                                   CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle

09:30:40 PM     all      0.34      0.00     82.75     16.20      0.00      0.00      0.69      0.00      0.00      0.02

09:31:26 PM     all      0.37      0.00      0.90     97.82      0.00      0.00      0.91      0.00      0.00      0.00
 

+ Excessive Swap usage during congestion.

            kbswpfree kbswpused  %swpused  kbswpcad   %swpcad

09:30:40 PM 250979304  34575112     12.11    276084      0.80

09:31:26 PM 253616356  31938060     11.18    392280      1.23

09:32:26 PM 254341568  31212848     10.93    579800      1.86

09:33:26 PM 255950616  29603800     10.37    638020      2.16

+ SWAP usage usage during normal processing.

12:01:01 AM 275034356  10520060      3.68     75528      0.72
 

+ Swap requests metrics shows SWAP demand

             pswpin/s pswpout/s08:32:02 PM    702.19  13874.26......09:30:40 PM     19.25   3329.45

Resolution

1.] Reduce Swap Usage

+ Change Kernel parameter vm.swappiness from 10 to 1. This will help cluster reduce overhead of swap usage and in-turn save CPU processing spent towards swap processing and use the saved cpu cycles towards user workloads.

+ Create a copy of exiting file /etc/sysctl.conf as a backup and modify the current file for vm.swappiness = 1 on Coordinator and all segment hosts in the cluster. Host restart is not required to apply the changes.

Refer for more detailed info    Overview of memory tuning best practices for Greenplum Database

2.] Optimize Memory usage - Refer for more details Resource Queues and Memory Management