How to calulate Skew on Join Index

Database
Enthusiast

How to calulate Skew on Join Index

I can use this code to calculate Skew on a table, but if I try to use a JoinIndex then I get an error. How do you calculate skew on a Join Index?

SELECT HASHAMP(HASHBUCKET(HASHROW(ColumnName1,ColumnName2))) AS AMPNumber,
COUNT(*) AS NumberOfRows
FROM DatabaseName.JoinIndex GROUP BY 1

Peter Schwennesen

Tags (1)
14 REPLIES
Senior Supporter

Re: How to calulate Skew on Join Index

The join index is created by an SQL therfore using this SQL as a derived table should give you the right info.

Something like

SELECT HASHAMP(HASHBUCKET(HASHROW(ColumnName1,ColumnName2))) AS AMPNumber,

       COUNT(*) AS NumberOfRows

  FROM (SQL which defines the JOIN INDEX) as t

GROUP BY 1

 

Ulrich


Junior Contributor

Re: How to calulate Skew on Join Index

Hi Peter,

if you need this for a "what if this was my new PI" you have to use the HASHAMP calculation, although skew on JIs is of less concern as they might prefer performance over distribution.

Btw, for real big tables doing it on a SAMPLE might be faster and still accurate enough.

But if this is about actual perm space skew, then instead of trying to get the actual number of rows per AMP it's better to use the actual perm space per AMP from dbc.TableSizeV. You don't have to think about table or join index, there's no additional overhead calculating it and this skew calculation will provide almost the same result for large tables.

In one case where they might show a difference: if there are Secondary Indexes, HASHAMP will show the base tabe skew, but dbc.TableSizeV includes the indexes, too.

The fastest possibe way to get an approximate size/skew for a base table plus each index is COLLECT DEMOGRAPHICS.

Teradata Employee

Re: How to calulate Skew on Join Index

Hello Dieter,

I was aware that the TableSize of a table includes the SI Subtable's size also. Hence, it makes sense that when we calculate the Skew Factor of a table by mean of TableSize using the formula, then the SI's subtable also affects the overall table's skew factor. I had overlooked this fact till now. Thanks for pointing it.

For example, we might have a NUSI column on "X" which is skewed (And, assuming it is still a NUSI for some reason). Even though the overall data distribution might be excellent owing to PI (Say "Y"), the presence of Skewed NUSI's subtable will indicate overall Table Skewness.

Having said that, if I don't have the authority for COLLECT DEMOGRAPHICS, is there any way to find the ACTUAL TABLE SKEWNESS excluding subtable, without the need to create a zero-SI similar table ?

Thanks in advance.




Junior Contributor

Re: How to calulate Skew on Join Index

Hi Peter,

you don't need any special access right (beside SELECT on the base table)  to submit a COLLECT DEMPGRAPHICS.

But the result is inserted into the DataDemographics table, which is part of the Query Capture Database (QCD) used by Visual Explain. And this might not exist on your system or you don't have rights on it.

Now the good news is, you only need this specific table and none else. Thus you can create it anywhere, it even works when it's created as a Volatile table (just tested on a TD14.10 VM). This definition is from the "SQL Request and Transaction Processing" manual (I just modified the PI, but it's still working):

CREATE TABLE DataDemographics (
MachineName VARCHAR(30) CHARACTER SET UNICODE UPPERCASE NOT CASESPECIFIC NOT NULL,
TABLENAME VARCHAR(128) CHARACTER SET UNICODE UPPERCASE NOT CASESPECIFIC NOT NULL,
DatabaseName VARCHAR(128) CHARACTER SET UNICODE UPPERCASE NOT CASESPECIFIC NOT NULL,
DBSize INTEGER NOT NULL,
CollectedTime TIMESTAMP(6) NOT NULL,
AMPNumber INTEGER NOT NULL,
ClusterNumber INTEGER NOT NULL,
SubTableID SMALLINT NOT NULL,
SubTableType VARCHAR(120) CHARACTER SET LATIN,
RowCount DECIMAL(18,0) NOT NULL,
AvgRowSize INTEGER NOT NULL,
QueryID INTEGER,
IndexName VARCHAR(128) CHAR SET UNICODE UPPERCASE NOT CASESPECIFIC,
DemographicsID INTEGER)
PRIMARY INDEX (DatabaseName, TABLENAME);

Then it's

COLLECT DEMOGRAPHICS FOR tab INTO DB_where_CollectDemographics_exists;

The syntax in the manuals is a bit misleading, a comma-delimited list of tables is allowed, but no enclosing parentheses.

SELECT 
dd.DatabaseName
,dd.TABLENAME
,dd.SubTableId
,dd.SubTableType
,dd.IndexName
,CASE WHEN t.ProtectionType = 'F' THEN 2 ELSE 2 END * dd.SubTableSize AS SubTableSize
,dd.SubTablePercent
,CASE WHEN t.ProtectionType = 'F' THEN 2 ELSE 2 END * dd.CurrentPerm AS TableSize
,dd.SkewFactor
,dd.SkewFactorWinDDI
,t.ProtectionType
FROM
(
SELECT
DatabaseName
,TABLENAME
,SubTableId
,SubTableType
,MIN(IndexName) AS IndexName
,SUM(RowCount * (AvgRowSize + 0.5)) AS SubTableSize
,100 * SubTableSize/NULLIF(CurrentPerm,0) AS SubTablePercent
,SUM(SubTableSize) OVER (PARTITION BY DatabaseName, TABLENAME) AS CurrentPerm

-- skew factor, 1 = even distribution, 1.1 = max AMP needs 10% more space than the average AMP
,MAX(RowCount * (AvgRowSize + 0.5)) / NULLIF(AVG(RowCount * (AvgRowSize + 0.5)),0) (DEC(5,2)) AS SkewFactor
-- skew factor, between 0 and 99. Same calculation as WinDDI/ TD Administrator
,(100 - (AVG(RowCount * (AvgRowSize + 0.5)) / NULLIF(MAX(RowCount * (AvgRowSize + 0.5)),0) * 100)) (DEC(3,0)) AS SkewFactorWinDDI
FROM DataDemographics AS dd
GROUP BY 1,2,3,4
) AS dd
JOIN dbc.TablesV AS t
ON dd.DatabaseName = t.DatabaseName
AND dd.TABLENAME = t.TABLENAME
ORDER BY 1,2,3;

This might include the pseudo-indexes created by a Join Index on the base table (adding WHERE IndexName IS NOT NULL should remove them, but I don't know if IndexName is always populated for real indexes).

You might also join it to a query calculating the size from dbc.TablesSizeV for comparison.

Btw, this caclulation might/will be totally wrong for tables with BlockLevel Compression (due to different reasons, e.g. Fallback being compressed but not primary data).

The only bad thing is that you can't run COLLECT DEMOPGRAPHICS in a Stored Procedure or Macro :-(

Enthusiast

Re: How to calulate Skew on Join Index

Hi Dieter

I need some time to consume all this new information

Peter

Enthusiast

Re: How to calulate Skew on Join Index

If I do:

CREATE MULTISET TABLE DataDemographics (
MachineName VARCHAR(30) CHARACTER SET UNICODE UPPERCASE NOT CASESPECIFIC NOT NULL,
TABLENAME VARCHAR(128) CHARACTER SET UNICODE UPPERCASE NOT CASESPECIFIC NOT NULL,
DatabaseName VARCHAR(128) CHARACTER SET UNICODE UPPERCASE NOT CASESPECIFIC NOT NULL,
DBSize INTEGER NOT NULL,
CollectedTime TIMESTAMP(6) NOT NULL,
AMPNumber INTEGER NOT NULL,
ClusterNumber INTEGER NOT NULL,
SubTableID SMALLINT NOT NULL,
SubTableType VARCHAR(120) CHARACTER SET LATIN,
RowCount DECIMAL(18,0) NOT NULL,
AvgRowSize INTEGER NOT NULL,
QueryID INTEGER,
IndexName VARCHAR(128) CHAR SET UNICODE UPPERCASE NOT CASESPECIFIC,
DemographicsID INTEGER)
PRIMARY INDEX (DatabaseName, TABLENAME);

COLLECT DEMOGRAPHICS FOR "DatabaseName.TableName" INTO DataDemographics;

It says: [3802] Database "DataDemographics" does not exist, but it does, because:

SELECT * FROM DataDemographics;

returns an empty table.

I tried both 13.10 and 14.10.

Peter 

Junior Contributor

Re: How to calulate Skew on Join Index

Hi Peter,

you don't specify the table name (always DataDemographics), it's the databasename, where DataDemographics is created :-)

Enthusiast

Re: How to calulate Skew on Join Index

Sorry but I do not understand the 

COLLECT DEMOGRAPHICS ...

COLLECT DEMOGRAPHICS FOR [MyTable??] INTO [MyTable??];

Peter

Junior Contributor

Re: How to calulate Skew on Join Index

Hi Peter,

you're welcome, I know it's confusing :-)

COLLECT DEMPGRAPHICS FOR mytable INTO myDatabase;

where myDatabase is the name of the database where DataDemographics has been created.