How is the Value "skccoeff" Under "gp_toolkit.gp_skew_coefficients" Calculated
search cancel

How is the Value "skccoeff" Under "gp_toolkit.gp_skew_coefficients" Calculated

book

Article ID: 295523

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

The view gp_toolkit.gp_skew_coefficients is used to show data distribution skew and this article describes how Greenplum calculates the value of the column "skccoeff" on gp_toolkit.gp_skew_coefficients view.

 


Environment


Resolution

According to the Greenplum Administrator Guide, the column "skccoeff" is "The coefficient of variation (CV), which is calculated as the standard deviation divided by the average." In addition, the "standard deviation" and "average" are based on (select gp_segment_id,count(*) from group by gp_segment_id).

For example, The "skccoeff" on gp_toolkit.gp_skew_coefficients for testtable is:

gpadmin=# Select * from gp_toolkit.gp_skew_coefficients where skcrelname='testtable'. 

skcoid | skcnamespace | skcrelname |       skccoeff
--------+--------------+------------+-----------------------
 252064 | public       | testtable  | 624.49979983984029000

And, that SQL that is used to calculate the column "skccoeff" value is:

gpadmin=# Select STDDEV(cnt)/avg(cnt)*100
from
   (
     select gp_segment_id segid,count(*) cnt from testtable group by gp_segment_id
     union all
     (
       select tmp2.content segid,0 cnt from
       gp_segment_configuration tmp2
       where tmp2.content not in (select gp_segment_id segid from testtable group by gp_segment_id)
       and tmp2.role='p'
       and tmp2.content<>-1
      )
    ) tmpc;

       ?column?
----------------------
624.4997998398402900


Additional Information

+ Environment:

Pivotal Greenplum Database (GPDB) all versions