Checking Data Skew (Alternative Method)
search cancel

Checking Data Skew (Alternative Method)

book

Article ID: 295215

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

When checking for datab skew using the view gp_toolkit.gp_skew_coefficients, it can be quite slow due to the fact that the view checks based on the row count of the table, as described in the article linked here. As a result, the larger the size or number of relations in the database, the longer it will take to complete the check.

 

Resolution

For GPDB Version 4.x or 5.x

The approach outlined in this method utilizes a view that examines the file sizes for each table, segment by segment. The output will include only those tables that have at least one segment with more than 20% more bytes than expected.

Two variants of the same script have been provided for your convenience. The first variant gives a quick summary of the OS file size distribution, while the second variant provides more detailed information.

Script 1 

1. Execute the following script in your database to create the user defined function (UDF) called "fn_create_db_files()":

CREATE OR REPLACE FUNCTION fn_create_db_files() RETURNS void AS
$$
DECLARE 
        v_function_name text := 'fn_create_db_files';
        v_location int;
        v_sql text;
        v_db_oid text;
        v_num_segments numeric;
        v_skew_amount numeric;
        v_version text;
BEGIN
        v_location := 1000;
        SELECT oid INTO v_db_oid 
        FROM pg_database 
        WHERE datname = current_database();

        v_location := 2000;
        v_sql := 'DROP VIEW IF EXISTS vw_file_skew';

        v_location := 2100;
        EXECUTE v_sql;
        
        v_location := 2200;
        v_sql := 'DROP EXTERNAL TABLE IF EXISTS db_files';

        v_location := 2300;
        EXECUTE v_sql;

        v_location := 3000;
        v_sql := 'CREATE EXTERNAL WEB TABLE db_files ' ||
                '(segment_id int, relfilenode text, filename text, ' ||
                'size numeric) ' ||
                'execute E''ls -l $GP_SEG_DATADIR/base/' || v_db_oid || 
                ' | ' ||
                'grep gpadmin | ' ||
                E'awk {''''print ENVIRON["GP_SEGMENT_ID"] "\\t" $9 "\\t" ' ||
                'ENVIRON["GP_SEG_DATADIR"] "/' || v_db_oid || 
                E'/" $9 "\\t" $5''''}'' on all ' || 'format ''text''';

        v_location := 3100;
        EXECUTE v_sql;

        v_location := 4000;
        SELECT count(*) INTO v_num_segments 
        FROM gp_segment_configuration 
        WHERE preferred_role = 'p' 
        AND content >= 0;

        v_location := 4100;
        v_skew_amount := 1.2*(1/v_num_segments);
        
        v_location := 4200;
        SELECT CASE WHEN POSITION('Greenplum Database 5' in v) > 0 THEN 'GPDB5'
                 WHEN POSITION('Greenplum Database 4' in v) > 0 THEN 'GPDB4'
                 ELSE 'other' END INTO v_version
        FROM version() AS v;

        v_sql := 'CREATE OR REPLACE VIEW vw_file_skew AS ' ||
                 'SELECT schema_name, ' ||
                 'table_name, ' ||
                 'max(size)/sum(size) as largest_segment_percentage, ' ||
                 'sum(size) as total_size ' ||
                 'FROM        ( ' ||
                 'SELECT n.nspname AS schema_name, ' ||
                 '      c.relname AS table_name, ' ||
                 '      sum(db.size) as size ' ||
                 '      FROM db_files db ';

        IF v_version = 'GPDB4' THEN
                 v_sql := v_sql || '      JOIN pg_class c ON ';
        ELSE
                 v_sql := v_sql || '      JOIN gp_dist_random(''pg_class'') c ON ';
        END IF;

        v_sql := v_sql || '      (split_part(db.relfilenode, ''.'', 1))::text = c.relfilenode::text ' ||
                 '      JOIN pg_namespace n ON c.relnamespace = n.oid ' ||
                 '      WHERE c.relkind = ''r'' and c.relstorage not in (''x'', ''v'', ''f'')' ||
                 '      GROUP BY n.nspname, c.relname, db.segment_id ' ||
                 ') as sub ' ||
                 'GROUP BY schema_name, table_name ' ||
                 'HAVING sum(size) > 0 and max(size)/sum(size) > ' || 
                 v_skew_amount::text || ' ' || 
                 'ORDER BY largest_segment_percentage DESC, schema_name, ' ||
                 'table_name';

        v_location := 4300;
        EXECUTE v_sql; 

EXCEPTION
        WHEN OTHERS THEN
                RAISE EXCEPTION '(%:%:%)', v_function_name, v_location, sqlerrm;
END;
$$
language plpgsql;

2. Execute the function so it will create the necessary objects in your database:

select * from fn_create_db_files();

3. Eelect from the view to see the tables that are skewed:

SELECT * FROM vw_file_skew ORDER BY 3 DESC;


Check the value of largest_segment_percentage in the view, closer to 1 indicates that one segment has more data compared to rest of the segments in the cluster.

For example, largest_segment_percentage = 0.83 means 83% of the table size is on one segment.


Script 2

This is the same script as script 1, but with more information on the table distribution (i.e based on table's OS file size).

1. Create the function:

CREATE OR REPLACE FUNCTION public.fn_get_skew(out schema_name      varchar,
                                              out table_name       varchar,
                                              out pTableName       varchar,
                                              out total_size_GB    numeric(15,2),
                                              out seg_min_size_GB  numeric(15,2),
                                              out seg_max_size_GB  numeric(15,2),
                                              out seg_avg_size_GB  numeric(15,2),
                                              out seg_gap_min_max_percent numeric(6,2),
                                              out seg_gap_min_max_GB      numeric(15,2),
                                              out nb_empty_seg     int) RETURNS SETOF record AS
$$
DECLARE
    v_function_name text := 'fn_get_skew';
    v_location int;
    v_sql text;
    v_db_oid text;
    v_num_segments numeric;
    v_skew_amount numeric;
    v_res record;
BEGIN
    v_location := 1000;
    SELECT oid INTO v_db_oid
    FROM pg_database
    WHERE datname = current_database();

    v_location := 2200;
    v_sql := 'DROP EXTERNAL TABLE IF EXISTS public.db_files_ext';

    v_location := 2300;
    EXECUTE v_sql;

    v_location := 3000;
    v_sql := 'CREATE EXTERNAL WEB TABLE public.db_files_ext ' ||
            '(segment_id int, relfilenode text, filename text, ' ||
            'size numeric) ' ||
            'execute E''ls -l $GP_SEG_DATADIR/base/' || v_db_oid ||
            ' | ' ||
            'grep gpadmin | ' ||
            E'awk {''''print ENVIRON["GP_SEGMENT_ID"] "\\t" $9 "\\t" ' ||
            'ENVIRON["GP_SEG_DATADIR"] "/' || v_db_oid ||
            E'/" $9 "\\t" $5''''}'' on all ' || 'format ''text''';

    v_location := 3100;
    EXECUTE v_sql;

    v_location := 4000;
    for v_res in (
                select  sub.vschema_name,
                        sub.vtable_name,
                        (sum(sub.size)/(1024^3))::numeric(15,2) AS vtotal_size_GB,
                        --Size on segments
                        (min(sub.size)/(1024^3))::numeric(15,2) as vseg_min_size_GB,
                        (max(sub.size)/(1024^3))::numeric(15,2) as vseg_max_size_GB,
                        (avg(sub.size)/(1024^3))::numeric(15,2) as vseg_avg_size_GB,
                        --Percentage of gap between smaller segment and bigger segment
                        (100*(max(sub.size) - min(sub.size))/greatest(max(sub.size),1))::numeric(6,2) as vseg_gap_min_max_percent,
                        ((max(sub.size) - min(sub.size))/(1024^3))::numeric(15,2) as vseg_gap_min_max_GB,
                        count(sub.size) filter (where sub.size = 0) as vnb_empty_seg
                    from (
                        SELECT  n.nspname AS vschema_name,
                                c.relname AS vtable_name,
                                db.segment_id,
                                sum(db.size) AS size
                            FROM ONLY public.db_files_ext db
                                JOIN pg_class c ON split_part(db.relfilenode, '.'::text, 1) = c.relfilenode::text
                                JOIN pg_namespace n ON c.relnamespace = n.oid
                            WHERE c.relkind = 'r'::"char"
                                and n.nspname not in ('pg_catalog','information_schema','gp_toolkit')
                                and not n.nspname like 'pg_temp%'
                            GROUP BY n.nspname, c.relname, db.segment_id
                        ) sub
                    group by 1,2
                    --Extract only table bigger than 1 GB
                    --   and with a skew greater than 20%
                    /*having sum(sub.size)/(1024^3) > 1
                        and (100*(max(sub.size) - min(sub.size))/greatest(max(sub.size),1))::numeric(6,2) > 20
                    order by 1,2,3
                    limit 100*/ ) loop
        schema_name         = v_res.vschema_name;
        table_name          = v_res.vtable_name;
        total_size_GB       = v_res.vtotal_size_GB;
        seg_min_size_GB     = v_res.vseg_min_size_GB;
        seg_max_size_GB     = v_res.vseg_max_size_GB;
        seg_avg_size_GB     = v_res.vseg_avg_size_GB;
        seg_gap_min_max_percent = v_res.vseg_gap_min_max_percent;
        seg_gap_min_max_GB  = v_res.vseg_gap_min_max_GB;
        nb_empty_seg        = v_res.vnb_empty_seg;
        return next;
    end loop;

    v_location := 4100;
    v_sql := 'DROP EXTERNAL TABLE IF EXISTS public.db_files_ext';

    v_location := 4200;
    EXECUTE v_sql;

    return;
EXCEPTION
        WHEN OTHERS THEN
                RAISE EXCEPTION '(%:%:%)', v_function_name, v_location, sqlerrm;
END;
$$
language plpgsql;

2. Use the below SQL to get the table distribution (based on table's OS file size):

select * from public.fn_get_skew();

For GPDB Version 6.x 

Starting with GPDB 6, the relfilenode no longer remains the same across segments. As a result, the script mentioned above may produce errors. If you are utilizing GPDB 6 or above, we recommend using the alternative script provided below to extract the skew via the OS filesystem level.

1. Create the function 

DROP FUNCTION IF EXISTS public.greenplum_check_skew();
CREATE FUNCTION public.greenplum_check_skew()
    RETURNS TABLE (
    	relation text,
    	vtotal_size_gb numeric,
    	vseg_min_size_gb numeric,
    	vseg_max_size_gb numeric,
    	vseg_avg_size_gb numeric,
    	vseg_gap_min_max_percent numeric,
    	vseg_gap_min_max_gb numeric,
    	vnb_empty_seg bigint)
    LANGUAGE plpgsql
    AS $$
DECLARE
	v_function_name text := 'greenplum_check_skew';
    v_location int;
    v_sql text;
    v_db_oid text;
BEGIN
    /* 
		The function checks the skew on greenplum table via OS file size
		* The table "greenplum_get_refilenodes" collects all the relfilenodes found on the database from all segments
		* The external "greenplum_get_db_file_ext" tables collects all the files that are on the base directory
		* The view "greenplum_get_file_statistics" combines the relfilenodes and displays the size of the relation by segments
		* The view "greenplum_get_skew_report" provide high level overview of the skew
    */
    -- Set the client min messages to just warning
    v_location := 1000;
    SET client_min_messages TO WARNING;

    -- Get the database oid
    v_location := 2000;
    SELECT d.oid INTO v_db_oid
    FROM pg_database d
    WHERE datname = current_database();

    -- Drop the temp table if it exists
    v_location := 3000;
    v_sql := 'DROP TABLE IF EXISTS public.greenplum_get_refilenodes CASCADE';
    v_location := 3100;
    EXECUTE v_sql;

    -- Temp table to temporary store the relfile records
    v_location := 4000;
    v_sql := 'CREATE TABLE public.greenplum_get_refilenodes ('
    '    segment_id int,'
    '    o oid,'
    '    relname name,'
    '    relnamespace oid,'
    '    relkind char,'
    '    relfilenode bigint'
    ')';
    v_location := 4100;
    EXECUTE v_sql;

    -- Store all the data related to the relfilenodes from all
    -- the segments into the temp table
    v_location := 5000;
    v_sql := 'INSERT INTO public.greenplum_get_refilenodes SELECT '
	'  s.gp_segment_id segment_id, '
	'  s.oid o, '
	'  s.relname, '
	'  s.relnamespace,'
	'  s.relkind,'
	'  s.relfilenode '
	'FROM '
	'  gp_dist_random(''pg_class'') s ' -- all segment
	'UNION '
	'  SELECT '
	'  m.gp_segment_id segment_id, '
	'  m.oid o, '
	'  m.relname, '
	'  m.relnamespace,'
	'  m.relkind,'
	'  m.relfilenode '
	'FROM '
	'  pg_class m ';  -- relfiles from master
	v_location := 5100;
    EXECUTE v_sql;

	-- Drop the external table if it exists
    v_location := 6000;
    v_sql := 'DROP EXTERNAL WEB TABLE IF EXISTS public.greenplum_get_db_file_ext';
    v_location := 6100;
    EXECUTE v_sql;

	-- Create a external that runs a shell script to extract all the files 
	-- on the base directory
	v_location := 7000;
    v_sql := 'CREATE EXTERNAL WEB TABLE public.greenplum_get_db_file_ext ' ||
            '(segment_id int, relfilenode text, filename text, ' ||
            'size numeric) ' ||
            'execute E''ls -l $GP_SEG_DATADIR/base/' || v_db_oid ||
            ' | ' ||
            'grep gpadmin | ' ||
            E'awk {''''print ENVIRON["GP_SEGMENT_ID"] "\\t" $9 "\\t" ' ||
            'ENVIRON["GP_SEG_DATADIR"] "/base/' || v_db_oid ||
            E'/" $9 "\\t" $5''''}'' on all ' || 'format ''text''';

    v_location := 7100;
    EXECUTE v_sql;


    -- Drop the datafile statistics view if exists
	v_location := 8000;
	v_sql := 'DROP VIEW IF EXISTS public.greenplum_get_file_statistics';
	v_location := 8100;
    EXECUTE v_sql;

    -- Create a view to get all the datafile statistics
    v_location := 9000;
	v_sql :='CREATE VIEW public.greenplum_get_file_statistics AS '
			'SELECT '
			'  n.nspname || ''.'' || c.relname relation, '
			'  osf.segment_id, '
			'  split_part(osf.relfilenode, ''.'' :: text, 1) relfilenode, '
			'  c.relkind, '
			'  sum(osf.size) size '
			'FROM '
			'  public.greenplum_get_db_file_ext osf '
			'  JOIN public.greenplum_get_refilenodes c ON ('
			'    c.segment_id = osf.segment_id '
			'    AND split_part(osf.relfilenode, ''.'' :: text, 1) = c.relfilenode :: text'
			'  ) '
			'  JOIN pg_namespace n ON c.relnamespace = n.oid '
			'WHERE '
			'  osf.relfilenode ~ ''(\d+(?:\.\d+)?)'' '
			-- '  AND c.relkind = ''r'' :: char '
			'  AND n.nspname not in ('
			'    ''pg_catalog'', '
			'    ''information_schema'', '
			'    ''gp_toolkit'' '
			'  ) '
			'  AND not n.nspname like ''pg_temp%'' '
			'  GROUP BY 1,2,3,4';
	v_location := 9100;
    EXECUTE v_sql;

     -- Drop the skew report view view if exists
	v_location := 10000;
	v_sql := 'DROP VIEW IF EXISTS public.greenplum_get_skew_report';
	v_location := 10100;
    EXECUTE v_sql;

    -- Create a view to get all the table skew statistics
    v_location := 11100;
	v_sql :='CREATE VIEW public.greenplum_get_skew_report AS '
			'SELECT '
			'	sub.relation relation,'
			'	(sum(sub.size)/(1024^3))::numeric(15,2) AS vtotal_size_GB,'  --Size on segments
			'    (min(sub.size)/(1024^3))::numeric(15,2) AS vseg_min_size_GB,'
			'    (max(sub.size)/(1024^3))::numeric(15,2) AS vseg_max_size_GB,'
			'    (avg(sub.size)/(1024^3))::numeric(15,2) AS vseg_avg_size_GB,' --Percentage of gap between smaller segment and bigger segment
			'    (100*(max(sub.size) - min(sub.size))/greatest(max(sub.size),1))::numeric(6,2) AS vseg_gap_min_max_percent,'
			'    ((max(sub.size) - min(sub.size))/(1024^3))::numeric(15,2) AS vseg_gap_min_max_GB,'
			'    count(sub.size) filter (where sub.size = 0) AS vnb_empty_seg '
			'FROM '
			'public.greenplum_get_file_statistics sub'
			'  GROUP BY 1';
	v_location := 11100;
    EXECUTE v_sql;

    -- Return the data back
    RETURN query (
        SELECT
            *
        FROM public.greenplum_get_skew_report a);

    -- Throw the exception whereever it encounters one
    EXCEPTION
        WHEN OTHERS THEN
                RAISE EXCEPTION '(%:%:%)', v_function_name, v_location, sqlerrm;
END;
$$;

2. Execute the function to generate the skew report

SELECT * FROM public.greenplum_check_skew();

3. For details on the table size by segments, you can make use of this view

SELECT * FROM public.greenplum_get_file_statistics;

 

Additional Information

How to Read the output?

  • Let say you received the o/p as below
    gpadmin=# SELECT * FROM public.greenplum_check_skew();
                   relation                | vtotal_size_gb | vseg_min_size_gb | vseg_max_size_gb | vseg_avg_size_gb | vseg_gap_min_max_percent | vseg_gap_min_max_gb | vnb_empty_seg
    ---------------------------------------+----------------+------------------+------------------+------------------+--------------------------+---------------------+---------------
    ....
    ....
    public.skewed_table                   |           2.56 |             0.00 |             2.56 |             1.28 |                   100.00 |                2.56 |             0
    ....
    ....
  • When analyzing the data skew, we recommend paying particular attention to the columns "vseg_gap_min_max_percent" and "vseg_gap_min_max_gb." These columns provide valuable information on the distribution of data across segments. The column "vseg_gap_min_max_percent" describes the percentage of gap between the segment with the minimum data and the segment with the maximum data. For example, a value of 100% indicates that one segment contains all of the data while the other segments are empty. The column "vseg_gap_min_max_gb" indicates the difference in file size between the segments with the minimum and maximum data. In this case, the difference is 2.56GB, which is approximately the size of the entire table (vtotal_size_gb).
  • By utilizing the information provided in the above columns and feeding it into the "public.greenplum_get_file_statistics" view, a breakdown of the file size on each segment can be obtained.
    gpadmin=# select * FROM public.greenplum_get_file_statistics
    where relation='public.skewed_table';
          relation       | segment_id | relfilenode | relkind |    size
    ---------------------+------------+-------------+---------+------------
     public.skewed_table |          1 | 214664      | r       | 2745303040
     public.skewed_table |          0 | 214664      | r       |      98304
    (2 rows)

NOTE:

While the method described above (calculating data skew using OS file size) can be useful, it does have some limitations. The most significant limitation is the potential for "table bloat."

When DML operations are performed on a table, the space is not always returned to the OS. As a result, the calculation of skew using OS file size will include the space occupied by table bloat.

To understand this issue in more detail, we recommend reviewing the information on "Bloat in HEAP Tables" and the various options for removing bloat from a table.