Query Performance - hash join

Database
Enthusiast

Query Performance - hash join

I am trying to improve the performance for the below query . Can anyone suggest better approach to optimize this.

PSTN_SNPS_TBL big table - will have 40million number of records per day. Have partition on Client ID and EDW_SNPS_DT.And we have 60Days of history data in this table. PI is ACCT_ID,SECR_ID.

TESTBOX.SECR_MSTR FINT  => this table have 4 million of data. PI is SECR_ID.

It seems it is joing  for hash join and both  bigger  and smaller table is broken into hash partitions in memory  and do Join.

When I run this query it takes approximately 25 - 30 secs .

SELECT *

FROM

PSTN_SNPS_TBL PSTN

INNER JOIN

TESTBOX.SECR_MSTR FINT

ON PSTN.CUSIP_ID = SECR.SECR_ID

AND SECR.CUR_RCD_IN='Y'

WHERE

PSTN.CLIENT_ID = '123'

AND EDW_SNPS_DT = '2012-04-05'

  1) First, we lock TESTBOX.SECR_MSTR in view

     TESTBOX.SECR_MSTR for access, and we lock

     TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL for access.

  2) Next, we do an all-AMPs RETRIEVE step from

     TESTBOX.SECR_MSTR in view TESTBOX.SECR_MSTR

     by way of an all-rows scan with a condition of (

     "TESTBOX.SECR_MSTR in view

     TESTBOX.SECR_MSTR.CUR_RCD_IN = 'Y'") into Spool 4

     (all_amps) (compressed columns allowed) fanned out into 9 hash

     join partitions, which is built locally on the AMPs.  The size of

     Spool 4 is estimated with high confidence to be 3,625,017 rows (

     1,953,884,163 bytes).  The estimated time for this step is 39.36

     seconds.

  3) We do an all-AMPs RETRIEVE step from a single partition of

     TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL with a condition of (

     "TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL.CLIENT_ID = '123 ' ,

     TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL.EDW_SNPS_DT = DATE '2012-04-05'")

     with a residual condition of ("(TESTBOX.PSTN_SNPS_TBL in

     view TESTBOX.PSTN_SNPS_TBL.EDW_SNPS_DT = DATE

     '2012-04-05') AND (TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL.CLIENT_ID = '123 ')") into Spool 5

     (all_amps) (compressed columns allowed) fanned out into 9 hash

     join partitions, which is redistributed by the hash code of (

     TESTBOX.PSTN_SNPS_TBL.SECR_ID) to all AMPs.  The size

     of Spool 5 is estimated with low confidence to be 27,176,952 rows

     (4,620,081,840 bytes).  The estimated time for this step is 12

     minutes and 16 seconds.

  4) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an

     all-rows scan, which is joined to Spool 5 (Last Use) by way of an

     all-rows scan.  Spool 4 and Spool 5 are joined using a hash join

     of 9 partitions, with a join condition of ("SECR_ID = SECR_ID").

     The result goes into Spool 3 (group_amps), which is built locally

     on the AMPs.  The size of Spool 3 is estimated with low confidence

     to be 17,590,717 rows (21,636,581,910 bytes).  The estimated time

     for this step is 1 minute and 36 seconds.

  5) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 3 are sent back to the user as the result of

     statement 1.  The total estimated time is 14 minutes and 31

     seconds.

Below is  the stats details

PSTN_SNPS_TBL

12/04/01 18:50:47           10,424,873       ACCT_ID

12/04/01 18:40:41            1,462,897       SECR_ID

12/04/01 17:19:27                1,481          CLIENT_ID

12/04/01 17:34:15                    2             EDW_CUR_RCD_IN

12/04/01 17:56:04                  166            EDW_SNPS_DT

12/04/01 17:15:45          105,974,582       ACCT_ID,SECR_ID

12/04/01 18:33:15        5,929,458,598     ACCT_ID,SECR_ID,EDW_SNPS_DT

12/04/01 19:05:11                9,259         PARTITION

TESTBOX.SECR_MSTR FINT

12/04/01 19:00:21            6,843,684   SECR_ID

12/04/01 17:40:29                  181      MNR_PROD_CD

12/04/01 17:06:45                    2       EDW_CUR_RCD_IN

12/04/01 17:40:03                   15     PROD_TY_CD

12/04/01 17:02:06                  184    ASST_TY_CD,ASST_STYP_CD,ASST_SUB_STY

12/04/01 17:43:42                    1    PARTITION

Can we do something here   for better performance?

4 REPLIES
Enthusiast

Re: Query Performance - hash join

Removed * from query.Specified columns.Now there is improvement in plan.But still Joining takes some time.

  1) First, we lock TESTBOX.SECR_MSTR in view

     TESTBOX.SECR_MSTR for access, and we lock

     TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL for access.

  2) Next, we do an all-AMPs RETRIEVE step from a single partition of

     TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL with a condition of (

     "TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL.IBD_ID = '123 ' ,

     TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL.EDW_SNPS_DT = DATE '2012-04-05'")

     with a residual condition of ("(TESTBOX.PSTN_SNPS_TBL in

     view TESTBOX.PSTN_SNPS_TBL.EDW_SNPS_DT = DATE

     '2012-04-05') AND (TESTBOX.PSTN_SNPS_TBL in view

     TESTBOX.PSTN_SNPS_TBL.IBD_ID = '123 ')") into Spool 4

     (all_amps) (compressed columns allowed) fanned out into 33 hash

     join partitions, which is built locally on the AMPs.  The size of

     Spool 4 is estimated with low confidence to be 27,176,952 rows (

     2,554,633,488 bytes).  The estimated time for this step is 12.30

     seconds.

  3) We do an all-AMPs RETRIEVE step from TESTBOX.SECR_MSTR

     in view TESTBOX.SECR_MSTR by way of an all-rows scan

     with a condition of ("TESTBOX.SECR_MSTR in view

     TESTBOX.SECR_MSTR.EDW_CUR_RCD_IN = 'Y'") into Spool 5

     (all_amps) (compressed columns allowed) fanned out into 33 hash

     join partitions, which is duplicated on all AMPs.  The size of

     Spool 5 is estimated with high confidence to be 522,002,448 rows (

     15,660,073,440 bytes).  The estimated time for this step is 1

     minute and 52 seconds.

  4) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an

     all-rows scan, which is joined to Spool 5 (Last Use) by way of an

     all-rows scan.  Spool 4 and Spool 5 are joined using a hash join

     of 33 partitions, with a join condition of ("CUSIP_ID = CUSIP_ID").

     The result goes into Spool 3 (group_amps), which is built locally

     on the AMPs.  The size of Spool 3 is estimated with low confidence

     to be 17,590,717 rows (2,867,286,871 bytes).  The estimated time

     for this step is 1 minute and 2 seconds.

  5) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 3 are sent back to the user as the result of

     statement 1.  The total estimated time is 3 minutes and 7 seconds. 

Enthusiast

Re: Query Performance - hash join

Try collecting stats on the PARTITION and (PARTITION, PI) Column if not done already.

Thanks

Junior Contributor

Re: Query Performance - hash join

PARTITION stats already exist, but stats on the partitioning columns are missing:

(CLIENT_ID,EDW_SNPS_DT)

You can always do a DIAGNOSTIC HELPSTATS ON FOR SESSION to see if the optimizer might need additional stats.

Dieter

Re: Query Performance - hash join

What is the partitioning range on the EDW_SNPS_DT in the table?