between condition make bad execution plan

Database
Enthusiast

between condition make bad execution plan

Hi,

i have a query like this:

sel * from tbla a inner join tblb b on a.TheDay between b.SDay and b.EDay;

this is the execution plan:

2) Next, we do an all-AMPs RETRIEVE step from

     tbla by way of an all-rows scan with no condition into

     Spool 2 (all_amps), which is duplicated on all AMPs.  The size of

     Spool 2 is estimated with high confidence to be 2,051,400 rows (

     34,873,800 bytes).  The estimated time for this step is 0.08

     seconds. 

  3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an

     all-rows scan, which is joined to b by way of an

     all-rows scan with no residual conditions.  Spool 2 and

     b are joined using a product join, with a join

     condition of ("(a.TheDay >= b.SDay) AND

     (a.TheDay <= b.EDay)").  The input table

     b will not be cached in memory, but it is eligible for

     synchronized scanning.  The result goes into Spool 1 (group_amps),

     which is built locally on the AMPs.  The result spool file will

     not be cached in memory.  The size of Spool 1 is estimated with

     low confidence to be 625,636,195,428 rows (51,302,168,025,096

     bytes).  The estimated time for this step is 13 hours and 24

     minutes. 

there are all needed stats on PI and on join colum, i use the correct alias and the format date is 'yyy-mm-dd' on between the tables.

How can i do?

How can i cheat the parser to generate a plan without product join? (with or without the stats the exec plan is the same)

Thanks

6 REPLIES
Senior Apprentice

Re: between condition make bad execution plan

There's no way to avoid the product join because it's not an equi-join.

What are you trying to achieve?

Table A is probably a calendar and you'll get multiple rows for each row in Table B. 

If you really need that you better use EXPAND ON.

Enthusiast

Re: between condition make bad execution plan

Hi Dieter,

for every row in table b i have to check if the values of the field TheDay, this field have the dates continuous, is content between the fields SDay (Start Day) and EDay (End Day) for every row in the table a.

For example the field TheDay have the min value '2005-01-01' and the max value '2020-12-31',

the field SDay have the min value '2014-08-01' and the max value '2016-09-01',

the field EDay have the min value '2014-09-01' and the max value '2999-12-31'.


I tried to substitute the two fields SDay and EDay into one, to use the PERIOD() type to EXTRACT ON () but if i use it i can't insert some dates because the fileds b.SDay and b.EDay can have the same value. 

how can i do?

TNX

Teradata Employee

Re: between condition make bad execution plan

What described above requires the product join, every row of b compared to every row of a. This is going to be expensive for a medium table joined to a large table.  Are you sure there is no other join condition available - account, customerid,...?

One way to improve it some is to do a separate qualification of a.TheDay against the MIN of b.SDay since it appears that there rows in a.TheDay that won't qualify based solely on that criteria. That requires two selects unioned together, one to get the rows that are less than the max and one to join the rows greater than or equal to the max to table b.

Enthusiast

Re: between condition make bad execution plan

sel * from tbla a inner join tblb b on a.TheDay between b.SDay and b.EDay;

Hi Todd,

there are no outer conditions to apply on the tbla, 450000000 rows, or tblb, 6500 rows that are duplicated and is done a product join... 912600000000000 rows  :(

me too have thinked about a solution to filter date with min and max, with something like that, but the problem remain:

SELECT TheDay , ... , SDay , EDay

FROM ( 

SELECT ... , SDay , EDay FROM lista_saldi WHERE ( SELECT MAX ( TheDay ) FROM tbla ) >= SDay 

AND ( SELECT MIN ( TheDay ) FROM tbla ) <= EDay 

) a INNER JOIN tbla b ON b.TheDay BETWEEN a.SDay AND a.EDay 

the dates are: 

MIN ( TheDay )= '2002-12-30',   MAX ( TheDay )=  '2020-12-31',   

MIN (SDay)    =  '2014-08-01',  MAX(SDay)      =  '2016-09-01',

MIN (EDay)    =  '2014-08-31',  MAX(SDay)      =  '2999-12-31',

so we have true conditions for a lot of the dates and the rows number remain similar:

SELECT TheDay , ... , SDay , EDay

FROM (                                                               '2020-12-31'       >=    '2014-08-01' (TRUE)

SELECT ... , SDay , EDay FROM lista_saldi WHERE ( SELECT MAX ( TheDay ) FROM tbla ) >= SDay 

                        '2002-12-30'           <=   '2014-08-31'  (TRUE)

AND ( SELECT MIN ( TheDay ) FROM tbla ) <= EDay 

) a INNER JOIN tbla b ON b.TheDay BETWEEN a.SDay AND a.EDay 

thanks for the help, i was hoping there was a solution but after your and Dieter sentence,maybe i have to talk with the customer about the data logic, to obtain a decrease of rows of the table tbla, maybe regarding the old date "2002" or the open date "2999"....

however i continue to think about it and rest hopeful of some great idea! ;)

Thanks


Teradata Employee

Re: between condition make bad execution plan

What is the result of select count(*) from  tbla where TheDay < '2014-08-01'?  Trying to exclude the rows in tbla that cannot match from the product join.

Is the desired result really the product join result? Or is it the list of rows from tblb that don't have a date in tbla?  The product join result of all the combinations seems to not be very useful IMHO.

Enthusiast

Re: between condition make bad execution plan

the result is 5041, the exclusion is irrelevant.

for the outer questions, i don't know the reply, i have to talk with the development enviroment.

"The product join result of all the combinations seems to not be very useful "... it's true.

 

The optimization have a limit called logistic!  ;)

TNX