Slow Left Join query

Database
Enthusiast

Slow Left Join query

Following query takes 14 minutes. Table OUT_JOIN1 contains 200001 rows. Table 

spa_usage1 contains app 54 million rows.

Stats are collected for both tables.

Will appreciate if you can share thoughts on how  performance of this query can be imporoved?

   select

      out1.S_SK,

      out1.C_SK,

      out1.S_ID,

      out1.C_ID,

      usg.P_SK,

      usg.SE_SK,

      usg.E_SK,

      usg.USG_DT_SK,

      out1.S_DT

   from

      MISC.OUT_JOIN1 out1 left join

      MISC.spa_usage1 usg

         on

         (

        out1.S_SK = usg.S_SK and

usg.USG_DT_SK > 17901

)        

  4) We do an all-AMPs RETRIEVE step from 270 partitions of MISC.usg

     with a condition of ("MISC.usg.USG_DT_SK >= 17902.") into Spool 2

     (all_amps), which is redistributed by the hash code of (

     MISC.usg.S_SK) to all AMPs.  Then we do a SORT to order

     Spool 2 by row hash.  The result spool file will not be cached in

     memory.  The size of Spool 2 is estimated with low confidence to

     be 567,176,439 rows (18,716,822,487 bytes).  The estimated time

     for this step is 3 hours and 10 minutes. 

  5) We do an all-AMPs JOIN step from MISC.out1 by way of a RowHash

     match scan with no residual conditions, which is joined to Spool 2

     (Last Use) by way of a RowHash match scan.  MISC.out1 and Spool 2

     are left outer joined using a merge join, with condition(s) used

     for non-matching on left table ("NOT (MISC.out1.S_SK IS NULL)"),

     with a join condition of ("MISC.out1.S_SK = S_SK"). 

     The result goes into Spool 1 (all_amps), which is built locally on

     the AMPs.  The result spool file will not be cached in memory. 

     The size of Spool 1 is estimated with low confidence to be

     567,276,441 rows (49,920,326,808 bytes).  The estimated time for

     this step is 1 hour and 31 minutes. 

CREATE TABLE MISC.spa_usage1 (

      S_SK DECIMAL(8,0)  NOT NULL,

      P_SK DECIMAL(8,0) NOT NULL,

      SE_SK DECIMAL(8,0)NOT NULL,

      E_SK DECIMAL(8,0) NOT NULL,

      USG_DT_SK DECIMAL(8,0) NOT NULL,

      ...

      ) 

PRIMARY INDEX (S_SK,C_SK)

PARTITION BY RANGE_N(CAST((USG_DT_SK ) AS INTEGER) BETWEEN 17898 AND 18170  EACH 1, NO RANGE);

CREATE TABLE MISC.OUT_JOIN1     (

      S_SK DECIMAL(8,0),

      C_SK DECIMAL(8,0),

      C_ID VARCHAR(32) ,

      S_ID VARCHAR(32) ,

      S_DT TIMESTAMP(0))

PRIMARY INDEX ( S_SK );

Thanks,

Niteen

6 REPLIES
Supporter

Re: Slow Left Join query

some observations -

1. "spa_usage1 contains app 54 million rows" vs. "The size of Spool 2 is estimated with low confidence to be 567,176,439 rows" 

which is a 10x difference. Stats not up to date? or wrong expectations?

2. PPI "PARTITION BY RANGE_N(CAST((USG_DT_SK ) AS INTEGER) BETWEEN 17898 AND 18170  EACH 1, NO RANGE)" does not support at least this query well...

3. PI (S_SK,C_SK) indicate multiple rows for S_SK. Do you have highly / extreme skewed values?

Enthusiast

Re: Slow Left Join query

Hi Ulrich,

select count(*) from misc.SPA_USAGE1 shows rows as  575,641,759


DataBaseName TableName skew_factor

1 misc                           spa_usage1                     1.71

My apologies for providing incomplete/incorrect information about no of rows.

What can be done in order to improve partition on date?

Niteen

Supporter

Re: Slow Left Join query

The skew value of the table is irrelevant as it is reflecting the S_SK,C_SK combination.

try 

select top 100 hashrow(S_SK), count(*)
from spa_usage1
group by 1
order by 2 desc

PPI - for this specific table you might be better of without PPI...

Enthusiast

Re: Slow Left Join query

Hello Ulrich,

All rows have same count value.


HASHROW(S_SK)           Count(*)

-------------------  -----------

6A330209                   14144

9F8501AE                   14144

....

C8E817F5                   14144

CF0C31DB                   14144

92BE4FF9                   14144

It appears that this being equally distributed, there is no chance to improve on this front. 

I have learnt this method from you, as to how to know distribution of S_SK.

Niteen

Supporter

Re: Slow Left Join query

But in this case it mean that for each matching S_SK your answer set will generate up to 14144 result  rows.

Is that what you expect?

Ulrich

Enthusiast

Re: Slow Left Join query

sel day_of_year from sys_calendar.calendar where calendar_date=date

 

it displays no of days previous date means (2011/05/03)

how to write query