Skew Factor

Database

Skew Factor

Hi iam Kishore.....here is my question.....suppose if have a skewfactor of 40%....then how to avoid the skewfactor...as the PI cannot be altered after table creation?

8 REPLIES
N/A

Re: Skew Factor

Yes, you need to create a new table with a different PI which has a better skew value.

You can check the impact of a different PI by using the  hashamp(hashbucket(hashrow(PI COL list))) functions.

something like 

select hashamp(hashbucket(hashrow(NEW PI COL list))),

          cast(count(*) as decimal(18,0))

from your table

group by 1

order by 1

would give you the row distribution of the new PI.

But never forget: A pi should serve at least TWO purposes - distribution and access!

Re: Skew Factor

Kishore,

Do check the below link for more info.....

http://forums.teradata.com/forum/database/data-distribution-skew

N/A

Re: Skew Factor

hi

i am new to teradata,  HOW REDUCE SKEW FACTOR IF IT IS >60 . IN TABLE I HAVE NUPI, SHALL I GO FOR COMBINATION OF COLUMN AS PI TO CHECK SKEW FACTOR . WHAT ARE OTHER APPROACH TO REDUCE SKEW FACTOR

Re: Skew Factor

Read the above post by ulrich.

In short yes, if you want to reduce the skeweness you need to check which columns or combination of columns provide better distribution.

N/A

Re: Skew Factor

hi

can we take timestamp column as PI. HERE is my table structure can you guide me col should/shouldn't be take as PI.

SVC_DSPCH_ID VARCHAR(50) CHARACTER SET UNICODE NOT CASESPECIFIC,

      SVC_BU_ID VARCHAR(15) CHARACTER SET LATIN NOT CASESPECIFIC,

      DW_SRC_SITE_ID SMALLINT,

      COUNTRY_CODE CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC,

      POSTAL_CODE VARCHAR(60) CHARACTER SET UNICODE NOT CASESPECIFIC,

      SOURCE_SYS_CREATE_BY VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,

      STAT_CD_SRC_DTS TIMESTAMP(6),

      SVC_STATUS_ID DECIMAL(18,0),

      SRC_TM_GMT_OFFSET DECIMAL(4,2),

      STAT_CD_GMT_DTS TIMESTAMP(6),

      DST SMALLINT,

      STAT_CD VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,

      STAT_DESC VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,

      DW_INS_UPD_DTS TIMESTAMP(6))

PRIMARY INDEX NUP_GFD_TS_AP_G1_STG2 ( DW_SRC_SITE_ID ,SOURCE_SYS_CREATE_BY )

CAN WE CHANGE THE ORDER OF PI THAT IS TAKEN LIKE IF 2ND AND 4TH COL IS PI I WANT TO TAKE NEW PI AS 2ND,4TH,1ST

Teradata Employee

Re: Skew Factor

The order of columns in defining PI doesnt matter.

No harm in adding Timestamp column, given it makes sense as per the table type and data it contains. What is the type of table? Is it staging table or DWH table?

N/A

Re: Skew Factor

thanks for reply

it is a stage table . is there any specific cases where we can't use timestamp as PI

Teradata Employee

Re: Skew Factor

PI is selected to give you bettter distribution and more importantly less or no re-distribution while using table. As its a stage table, you need to check transformation rules to know which colymns will be used for joining/transforming data. You should also know which columns are PK at the source side .... that will give you better distribution and most probably may also be used in tranformation logic.

HTH!