Data redistribution

Database
Fan

Data redistribution

The PI for the table is tran_id.

Can someone please let me know why  the data is being redistributed for volatile table creation when the PI is the same for the volatile table as well.

Please see Explain plans below.

CREATE SET TABLE TRAN_STAT ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      Tran_Id DECIMAL(18,0) NOT NULL,

      Tran_Line_Nbr INTEGER,

      Tran_Line_Stat_Start_Dt DATE FORMAT 'YYYY/MM/DD',

      Tran_Line_Stat_End_Dt DATE FORMAT 'YYYY/MM/DD',

      Stat_Cd VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,

      Ins_Batch_Id INTEGER NOT NULL,

      Updt_Batch_Id INTEGER)

PRIMARY INDEX ( Tran_Id );

Explain

sel *

 from TRAN_STAT

 qualify rank() over (partition by tran_id order by tran_line_stat_start_dt asc) = 1

  1) First, we lock a distinct "pseudo table" for read on a

     RowHash to prevent global deadlock for TRAN_STAT.

  2) Next, we lock TRAN_STAT for read.

  3) We do an all-AMPs STAT FUNCTION step from TRAN_STAT by

     way of an all-rows scan with no residual conditions into Spool 5

     (Last Use), which is built locally on the AMPs.  The result rows

     are put into Spool 1 (group_amps), which is built locally on the

     AMPs.  The size is estimated with low confidence to be 556,288,594

     rows (41,721,644,515 bytes).

  4) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.

EXPLAIN create volatile  table vt_ts

as

(sel *

 from TRAN_STAT

 qualify rank() over (partition by tran_id order by tran_line_stat_start_dt asc) = 1

 ) with data primary index (tran_id)  on commit preserve rows        ;

  1) First, we lock a distinct "pseudo table" for read on a

     RowHash to prevent global deadlock for TRAN_STAT.

  2) Next, we lock TRAN_STAT for read.

  3) We create the table header.

  4) We do an all-AMPs STAT FUNCTION step from TRAN_STAT by

     way of an all-rows scan with no residual conditions into Spool 5

     (Last Use), which is built locally on the AMPs.  The result rows

     are put into Spool 3 (all_amps), which is built locally on the

     AMPs.  The size is estimated with high confidence to be

     889,500,023 rows (70,270,501,817 bytes).

  5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of

     an all-rows scan with a condition of ("Field_9 = 1") into Spool 1

     (all_amps), which is redistributed by the hash code of (

     TRAN_STAT.Tran_Id) to all AMPs.  Then we do a SORT to

     order Spool 1 by row hash.  The result spool file will not be

     cached in memory.  The size of Spool 1 is estimated with low

     confidence to be 556,288,594 rows (32,821,027,018 bytes).  The

     estimated time for this step is 3 minutes and 7 seconds.

  6) We do an all-AMPs MERGE into vt_ts from Spool 1 (Last

     Use).  The size is estimated with low confidence to be 556,288,594

     rows.  The estimated time for this step is 1 second.

  7) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> No rows are returned to the user as the result of statement 1.

6 REPLIES
Enthusiast

Re: Data redistribution

Hi Sennet,

Check the difference in the type of operation your performing. Ur creating a SET table with just the schema for which u could see the EXPLAIN PLAIN.

Whereas when u create the volatile table the PI might be the same, but in the select ur doing a rank() over, the data is redistributed to all the AMPS for comparison to find the rank depending on the Tran_Id

Thanks

Jugal.

Fan

Re: Data redistribution

No, I am asking why there is data redistribution  in the second explain plan, the PI being tran_id why is it not amp local.

I am comparing these two explain plans.

Explain

sel *

 from TRAN_STAT

 qualify rank() over (partition by tran_id order by tran_line_stat_start_dt asc) = 1;

EXPLAIN create volatile  table vt_ts

as

(sel *

 from TRAN_STAT

 qualify rank() over (partition by tran_id order by tran_line_stat_start_dt asc) = 1

 ) with data primary index (tran_id)  on commit preserve rows        ;

Fan

Re: Data redistribution

Is my question unclear or silly. Please let me know.

Enthusiast

Re: Data redistribution

Your question is clear. I ran for some of my tables 

Explain

sel *

I can see there is redistribution by hashcode.

It looks like it is upto the optimizer and the data demography.

Teradata Employee

Re: Data redistribution

It appears that after the optimizer planned the qualify over the tran_Id it forgot that the spool was still on the amps of the hash of tran_id. It knew that for the rank/partition/qualify because it did both of those spools locally. 

Note that a step would be required there anyway to sort the spool by tran_id prior to the insert since the data has been ordered by the start date field in the spool. But a redistribution is not required for that sort step. 

It would be reasonable to submit an incident asking for an RFC to improve the optimizer for this case. 

Fan

Re: Data redistribution

Thanks. Will open an incident with Teradata.