Table Partition

Database
Enthusiast

Table Partition

CREATE SET TABLE INFORM_USER.TABLEA ,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL,

CHECKSUM = DEFAULT,

DEFAULT MERGEBLOCKRATIO

(

PLCY_NBR INTEGER,

MO_ID INTEGER,

PLCY_STATUS VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC

)

PRIMARY INDEX ( PLCY_NBR ,MO_ID )

PARTITION BY RANGE_N(CAST(((CAST((MO_ID ) AS VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC))||'01') AS DATE FORMAT 'YYYYMMDD') BETWEEN DATE '2010-01-01' AND DATE '2090-12-01' EACH INTERVAL '1' MONTH);

 

Explain SEL * FROM INFORM_USER.TABLEA WHERE MO_ID IN (201201,201202)

1) First, we lock a distinct INFORM_USER."pseudo table" for read on a

RowHash to prevent global deadlock for INFORM_USER.TABLEA.

2) Next, we lock INFORM_USER.TABLEA for read.

3) We do an all-AMPs RETRIEVE step from INFORM_USER.TABLEA by way of

an all-rows scan with a condition of ("(INFORM_USER.TABLEA.MO_ID =

201202) OR (INFORM_USER.TABLEA.MO_ID = 201201)") into Spool 1

(group_amps), which is built locally on the AMPs. The size of

Spool 1 is estimated with no confidence to be 6 rows (222 bytes).

The estimated time for this step is 0.02 seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 1 are sent back to the user as the result of

statement 1. The total estimated time is 0.02 seconds.

As per the explain plan the query doesn't look in to the 2 partitions but rather looking in to all rows. Why is it so?

9 REPLIES
Enthusiast

Re: Table Partition

all-rows scan with a condition of ("(INFORM_USER.TABLEA.MO_ID =

201202) OR (INFORM_USER.TABLEA.MO_ID = 201201)") 

Cheers,

Raja

Enthusiast

Re: Table Partition

Hi,

There are some points to notice about your query:

The first thing is you have a defined PI on (PLCY_NBR, MPD_ID) but you are using a partial value in your where clause using only MOD_ID. So when ever there is a partial index search, the optimizer will not be able to use PI. 

The second thing is you are using IN condition with two values, and it ("(INFORM_USER.TABLEA.MO_ID =

201202) OR (INFORM_USER.TABLEA.MO_ID = 201201)"), whenever there is a OR condition in a where clause it might lead to Full table scan.

So to avoid FTS, you can introduce SI on the columns being used in the where condition. define a USI, or NUSI on top of MOD_ID, and collect stats, hope it will help to avoid full table scan.

Khurram
Senior Apprentice

Re: Table Partition

Did you try to run you query on a table with data in it? According to your explain the table is empty.

Enthusiast

Re: Table Partition

Hi,

-- CASE 1

CREATE SET TABLE INFORM_USER.TABLEA ,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL,

CHECKSUM = DEFAULT,

DEFAULT MERGEBLOCKRATIO

(

PLCY_NBR INTEGER,

MO_ID INTEGER,

PLCY_STATUS VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC

)

PRIMARY INDEX ( PLCY_NBR,MO_ID )

PARTITION BY RANGE_N(CAST(((CAST((MO_ID ) AS VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC))||'01') AS DATE FORMAT 'YYYYMMDD') BETWEEN DATE '2010-01-01' AND DATE '2090-12-01' EACH INTERVAL '1' MONTH);

INSERT INTO INFORM_USER.TABLEA VALUES (123,201201,'INFORCE');

INSERT INTO INFORM_USER.TABLEA VALUES (456,201202,'CANCEL');

Explain SEL * FROM INFORM_USER.TABLEA WHERE MO_ID IN (201201,201202);

1) First, we lock a distinct INFORM_USER."pseudo table" for read on a

RowHash to prevent global deadlock for INFORM_USER.TABLEA.

2) Next, we lock INFORM_USER.TABLEA for read.

3) We do an all-AMPs RETRIEVE step from INFORM_USER.TABLEA by way of

an all-rows scan with a condition of ("(INFORM_USER.TABLEA.MO_ID =

201202) OR (INFORM_USER.TABLEA.MO_ID = 201201)") into Spool 1

(group_amps), which is built locally on the AMPs. The size of

Spool 1 is estimated with no confidence to be 6 rows (222 bytes).

The estimated time for this step is 0.02 seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 1 are sent back to the user as the result of

statement 1. The total estimated time is 0.02 seconds.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-- CASE 2

CREATE SET TABLE INFORM_USER.TABLEB ,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL,

CHECKSUM = DEFAULT,

DEFAULT MERGEBLOCKRATIO

(

PLCY_NBR INTEGER,

MO_ID INTEGER,

PLCY_STATUS VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC

)

PRIMARY INDEX ( PLCY_NBR,MO_ID )

PARTITION BY RANGE_N(MO_ID BETWEEN 201001 AND 209012 EACH 1);

INSERT INTO INFORM_USER.TABLEB VALUES (123,201201,'INFORCE');

INSERT INTO INFORM_USER.TABLEB VALUES (456,201202,'CANCEL');

Explain SEL * FROM INFORM_USER.TABLEB WHERE MO_ID IN (201201,201202);

1) First, we lock a distinct INFORM_USER."pseudo table" for read on a

RowHash to prevent global deadlock for INFORM_USER.TABLEB.

2) Next, we lock INFORM_USER.TABLEB for read.

3) We do an all-AMPs RETRIEVE step from 2 partitions of

INFORM_USER.TABLEB with a condition of (

"(INFORM_USER.TABLEB.MO_ID = 201202) OR (INFORM_USER.TABLEB.MO_ID

= 201201)") into Spool 1 (group_amps), which is built locally on

the AMPs. The size of Spool 1 is estimated with no confidence to

be 6 rows (222 bytes). The estimated time for this step is 0.02

seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 1 are sent back to the user as the result of

statement 1. The total estimated time is 0.02 seconds.

My Question is the way partition is defined is different in two cases. Why does first case does a all rows scan and second case looks in to the 2 partitions alone?

@Raja K Thaw - I am not able to get what you are trying to say.

@khurram - Ok lets agree that it does full table scan when we use 'IN' clause. In my case 2 i used the same 'IN' clause but it doesnt do full table scan and looks in to 2 partitions only. Why is it so?

@dieter - The data has been inserted and still getting the same issue

Thanks,

Sharat Balaji

Dieter

Enthusiast

Re: Table Partition

Hi Sharat,

In my opinion, this is the problem of Case 1:

Here you create a range_n with Date values "AS DATE", but after find with integers "(201201,201202)"

Regards.

Enthusiast

Re: Table Partition

Hi Sharat,

Here is the answer to your question:

http://www.info.teradata.com/htmlpubs/DB_TTU_14_00/index.html#page/Database_Management/B035_1094_111...

See the lines:

You can optimize the effects of using RANGE_N in your partitioning expression by observing the following guidelines:

Reference only a single integer (BYTEINT, SMALLINT, INTEGER, BIGINT), character (CHARACTER, GRAPHIC, VARCHAR, VARGRAPHIC), DATE, or TIMESTAMP column, not expressions.

For example, specifying a simple expression such as  in place of a column name in the RANGE_N specification, even if the expression references only a single column, can hinder, or even prevent, row partition elimination.

Khurram
Enthusiast

Re: Table Partition

Hi Khurram,

Thank you very much for quick response and also the link. I have one more question. The below query gives  the result as shown

sel tablename,definedcombinedpartitions,maxcombinedpartitions from dbc.PartitioningConstraintsV WHERE databasename = 'Inform_User' and tablename in ('TABLEA', 'TABLEB')

TableName,DefinedCombinedPartitions,MaxCombinedPartitions

TABLEB, 8012, 8012

ABLEA, 972, 972

from the results my Understanding is that there are 8012 and 972 partitions already created. But the explain query still does all row scan. So is this also due to contents given in the above link?

Thanks,

Sharat Balaji

Enthusiast

Re: Table Partition

Hi Sharat,

Yes, it is due the fact the use of expression in RANGE_N can prohibit the optimizer to eliminate the partitions and it does a full table scan. So partitions are there, but the way partitions are created is not good for optimizer to elimnate them.

Khurram
Teradata Employee

Re: Table Partition

When the partitioning expression is very complicated, the optimizer cannot see through it to see that the partition access can be done. This is especially true for multiple casts.

If you change the partitioning expression to:

CREATE SET TABLE PLS.tableb ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      PLCY_NBR INTEGER,

      MO_ID INTEGER,

      PLCY_STATUS VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC)

PRIMARY INDEX ( PLCY_NBR ,MO_ID )

PARTITION BY RANGE_N(CAST((((MO_ID - 190000 )* 10 )+ 1 ) AS DATE) BETWEEN

DATE '2010-01-01' AND DATE '2090-12-01' EACH INTERVAL '1' MONTH );

Then you can get partition access for single equality. It does not appear that it will work for an IN list though, you can union multiple selects together for that.

Have you considered doing the type conversion during ETL and storing it as a date rather than as a monthid? Then everything works cleanly.