Redistribution during insert-select

Database
Junior Supporter

Redistribution during insert-select

INSERT INTO MERGE_S 
SELECT	 MIN(A.EFF_DT),  COALESCE(R_CUST_ID,CUST_ID),
		 BRWSR_ID  
FROM  	 WEB_W A  LEFT OUTER JOIN   CUST_1 B 
	ON	A.CUST_ID=B.MERGE_CUST_ID 
GROUP	BY 2,3

In the above query PI are as follows :

 

  • web_w is cust_id
  • cust_1 is merge_cust_id
  • merge_s is cust_id and browsr_id

the join happens as PI-PI. but, when it goes to merging into merge_s, it does a redistribution on cust_id and browsr_id when the PI of my merge_s is on cust_id and browsr_id. Is there a way to avoid it as i am trying to tune this query and want to avoid redistribution here. Please refer to the explain below.

  4) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
     way of an all-rows scan , grouping by field1 ( (CASE WHEN (NOT
     (R_CUST_ID IS NULL )) THEN
     (R_CUST_ID) ELSE
     (A.CUST_ID) END) ,A.BRWSR_ID).  Aggregate
     Intermediate Results are computed globally, then placed in Spool 6.
     The aggregate spool file will not be cached in memory.  The size
     of Spool 6 is estimated with low confidence to be 599,797,031 rows
     (223,124,495,532 bytes).  The estimated time for this step is 12
     minutes and 51 seconds.
  5) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of
     an all-rows scan into Spool 1 (all_amps), which is redistributed
     by the hash code of ((CASE WHEN (NOT
     (R_CUST_ID IS NULL )) THEN
     (R_CUST_ID) ELSE
     (A.CUST_ID) END), A.BRWSR_ID) to all AMPs.
     Then we do a SORT to order Spool 1 by row hash.  The size of Spool
     1 is estimated with low confidence to be 599,797,031 rows (
     64,778,079,348 bytes).  The estimated time for this step is 2
     minutes and 22 seconds.
  6) We do an all-AMPs MERGE into idw_dba_dwa.WEB_BROWSER_CUST_MERGE_S
     from Spool 1 (Last Use).  The size is estimated with low
     confidence to be 599,797,031 rows.  The estimated time for this
     step is 1 hour and 54 minutes.

Thanks !

Samir Singh


Accepted Solutions
Teradata Employee

Re: Redistribution during insert-select

It appears that it is not different in more recent releases. The optimzer does not recognize that grouping by the result of the coalesce operation is already distributed as needed for the insert. So it performs the additional redistribution to prepare for the insert. 

 

It would not be possible to optimize this aspect of the query short of removing the coalesce operation from the query.

 

P.S. Has step level DBQL been reviewed for the execution of the query? In the Explain plan, the redistribution step is quite a small part of the total cost of the query. Perhaps the things to optimize in the query are other parts of the plan?

1 ACCEPTED SOLUTION
6 REPLIES
Teradata Employee

Re: Redistribution during insert-select

What teradata database release is being used?
Junior Supporter

Re: Redistribution during insert-select

We are on teradat 14.10 . Does Td version has to take something with this ?

Teradata Employee

Re: Redistribution during insert-select

Don't know, checking...  Always possible that the optimizer learns somethinng new in each new release.

Teradata Employee

Re: Redistribution during insert-select

It appears that it is not different in more recent releases. The optimzer does not recognize that grouping by the result of the coalesce operation is already distributed as needed for the insert. So it performs the additional redistribution to prepare for the insert. 

 

It would not be possible to optimize this aspect of the query short of removing the coalesce operation from the query.

 

P.S. Has step level DBQL been reviewed for the execution of the query? In the Explain plan, the redistribution step is quite a small part of the total cost of the query. Perhaps the things to optimize in the query are other parts of the plan?

Junior Supporter

Re: Redistribution during insert-select

Hi Todd,

I have never delved deeper into the step level in the dbql. could you give me some pointers as to how and what to see the stepinfo ? Is there something to read about this in details.

Thanks !

Samir

Teradata Employee

Re: Redistribution during insert-select

"View of the Query Step Information Log Table:QryLogSteps[V]" in the Database Administration manual.

 

Conceptually the step info is similar to the querylog except that it has one detail record for each step. 

 

Since it does generate one record for each step, it should be turned on judiciously since it will generate a lot of data. Enable it just for the user or application to be analysed in detail, and disable it when the analysis is complete. Enable and disable are done with the "WITH STEPINFO" clause on the "BEGIN LOGGING" statement.