Issue with MLPPI

Database
Enthusiast

Issue with MLPPI

All,

 

I have created MLPPI and I see the below given issue for a character partitioned column. Kindly help on the same.

 

CREATE MULTISET TABLE EDW_XXX.TABLE1 ,NO FALLBACK ,

(CUST_ID VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL
MFR_ID VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL,
TIME_VAL VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL,
PRD_TYP VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL,
WK_NUM INTEGER)
PRIMARY INDEX ( CUST_ID ,MFR_ID ,TIME_VAL ,PRD_TYP ,WK_NUM )
PARTITION BY ( CASE_N(
PRD_TYP = 'SL',
PRD_TYP = 'LQ',
NO CASE OR UNKNOWN),CASE_N(
TIME_VAL = 'Product selling- Yearly or Half-yearly',
TIME_VAL = 'Product selling - Weekly',
NO CASE OR UNKNOWN),RANGE_N(WK_NUM BETWEEN 1 AND 52 EACH 1 ,
NO RANGE OR UNKNOWN) );

MLPPI - is done in the above table

Inserted a sample of 100 records .
TIME_VAL = 'Product selling- Yearly or Half-yearly' -- 80 records;
TIME_VAL = 'Product selling- Weekly' --20 records;

Before collect stats:
explain
sel * from EDW_XXX.TABLE1
where TIME_VAL = 'Product selling- Yearly or Half-yearly';

explain
sel * from EDW_XXX.TABLE1
where TIME_VAL = 'Product selling - Weekly';

Estimated row count is 80 and 20 respectively

After collect Stats

COLLECT STATISTICS COLUMN (TIME_VAL) ON EDW_XXX.TABLE1;
COLLECT STATISTICS COLUMN (PARTITION) ON EDW_XXX.TABLE1;

explain
sel * from EDW_XXX.TABLE1
where TIME_VAL = 'Product selling- Yearly or Half-yearly'; -- 1 row

explain
sel * from EDW_XXX.TABLE1
where TIME_VAL = 'Product selling - Weekly'; - 20 rows

The number of estimated rows is different after Collect stats for the TIEM_VAL 'Product selling- Yearly or Half-yearly'.

On researching further
sel * from EDW_XXX.TABLE1
where TIME_VAL like 'Product selling- Yearly o%'; -- 80 rows
sel * from EDW_XXX.TABLE1
where TIME_VAL like 'Product selling- Yearly or%'; -- 1 row

when the character limit crosses 25 the explain plan starts changing.

Is there any limitation in using character fields for MLPPI or am i missing something. Pls help

BTW, the performance is same for thie simple query And when we use this in the complex queries the Impact CPU is very high. 

 

Aprpeciate your support.

2 REPLIES
Senior Apprentice

Re: Issue with MLPPI

Hi,

 

This isn't a limitation of PPI, it is in the 'statistics' processing.

 

By default, Teradata only collects stats on the first 25 bytes of data, so the stats are only showing part of the data values.

 

I think you need to use the MaxValueLength option on the collect stats command. I'd set the value to maximum length of the TIME_VAL column.

 

HTH
Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Teradata Employee

Re: Issue with MLPPI

The problem is not directly related to PPI, it's due to having a character field where a relatively large number of leading characters are identical.

In this case, you need to specify a MAXVALUELENGTH for the statistics that is at least long enough to ensure different values will fall into different "buckets".