I'm facing the situation where analyst creating (skewed) tables in a scratch table environment by selecting a bad PI.
I gave up on education and instead, I would like to run a script that automatically detects the best PI for a given table and changes the PI.
Does anyone had already the same problem - I would much apriciate any insights.
The PI once created on a table can't be changed simply. You will have to create a copy of the table with a different PI, insert the data in the second table, delete the base table and rename the second table to the base table. And especially if the table is huge then this might not be a simple operation.
I think you might know the below query as how to analyze the PI choices for a table for better distribution
SELECT HASHAMP(HASHBUCKET(HASHROW(<PI_COLUMN_LIST>))) AS "AMP#",COUNT(*)
GROUP BY 1
ORDER BY 2 DESC;
The above query should give you idea as how many rows will land on a certain AMP, so you can add/remove the column list to choose the best suitable PI for the table...
Generally the numeric/calculated values are the metrices and the lookup values could be the dimension candidates... but again you need to look into the data before tagging the facts and dimensions...
I made myself not very clear - I would like to know how to find what clumn is a metric or dimension in any given table.
Once I know that I could run the following SQL to compare pre-post and could setup a batch that runs every night and is screening others table schemas.
SUM(SKEWCALC .NUMBEROFROWS) AS TOTALROWS
, MIN(SKEWCALC .NUMBEROFROWS) AS MINROWSONAMP
, MAX(SKEWCALC .NUMBEROFROWS) AS MAXROWSONAMP
, AVG(SKEWCALC .NUMBEROFROWS) AS AVGROWSONAMP
, 100 - ( AVG (SKEWCALC. NUMBEROFROWS)/ MAX (SKEWCALC. NUMBEROFROWS) * 100 ) AS SKEWFACTOR
HASHAMP (HASHBUCKET (HASHROW(
))) AS AMPNUMBER ,
CAST(COUNT(*) AS DECIMAL(18,0)) AS NUMBEROFROWS
GROUP BY 1
) AS SKEWCALC ;