Auto select on PI

Database
Enthusiast

Auto select on PI

HI,

I'm facing the situation where analyst creating (skewed) tables in a scratch table environment by selecting a bad PI.

I gave up on education and instead, I would like to run a script that automatically detects the best PI for a given table and changes the PI.

Does anyone had already the same problem - I would much apriciate any insights.

Cheers,

Gordon

4 REPLIES
Enthusiast

Re: Auto select on PI

The PI once created on a table can't be changed simply. You will have to create a copy of the table with a different PI, insert the data in the second table, delete the base table and rename the second table to the base table. And especially if the table is huge then this might not be a simple operation.

I think you might know the below query as how to analyze the PI choices for a table for better distribution

SELECT HASHAMP(HASHBUCKET(HASHROW(<PI_COLUMN_LIST>))) AS "AMP#",COUNT(*)
FROM <TABLE_NAME>
GROUP BY 1
ORDER BY 2 DESC;
Enthusiast

Re: Auto select on PI

is there a way to determine the best PI before creating the table and how to determine what is a metric and what is a dimension?

Enthusiast

Re: Auto select on PI

The above query should give you idea as how many rows will land on a certain AMP, so you can add/remove the column list to choose the best suitable PI for the table...

Generally the numeric/calculated values are the metrices and the lookup values could be the dimension candidates... but again you need to look into the data before tagging the facts and dimensions...

Enthusiast

Re: Auto select on PI

Sorry,

I made myself not very clear - I would like to know how to find what clumn is a metric or dimension in any given table.

Once I know that I could run the following SQL to compare pre-post and could setup a batch that runs every night and is screening others table schemas.

thoughts?

SELECT

                 SUM(SKEWCALC .NUMBEROFROWS) AS TOTALROWS

                , MIN(SKEWCALC .NUMBEROFROWS) AS MINROWSONAMP

                , MAX(SKEWCALC .NUMBEROFROWS) AS MAXROWSONAMP

                , AVG(SKEWCALC .NUMBEROFROWS) AS AVGROWSONAMP

                , 100 - ( AVG (SKEWCALC. NUMBEROFROWS)/ MAX (SKEWCALC. NUMBEROFROWS) * 100 ) AS SKEWFACTOR

FROM

                (

                                 SELECT

                                                 HASHAMP (HASHBUCKET (HASHROW(

                                                         [your_PI&your_new_PI]

                                                     ))) AS AMPNUMBER ,

                                                 CAST(COUNT(*) AS DECIMAL(18,0)) AS NUMBEROFROWS

                                 FROM

                                                 [your_table_name]

                                 GROUP BY 1

                ) AS SKEWCALC ;