Tuning the SQL query

General

Tuning the SQL query

Hi,

   I have a multiset volatile table having 10k records and a table having 5 billion records. I have to create a multiset volatile table by inner joining these two tables. But, the query is failing due to spool space issue. Is there any way to tune this SQL, apart from increasing the spool space?

Thanks,

Vinay.

4 REPLIES

Re: Tuning the SQL query

Hi Vinay,

My first question would be if the inner join condition applies to the Primary Index (PI) of the larger table.

Otherwise the execution plan would probably try to redistribute those 5 billion records to Spool, preparing for join processing, and that would probably exhaust your user Spool quota.

So you can tune the SQL if the join condition is not currently applied to the PI of the larger table, check if this is feasible, depending on the volatile table design.

Skew or bad PI distribution could also be the problem, i.e. multiple records having same PI value in the result set would hash to same AMP and perhaps trigger the Spool space error (remeber the Spool quota is divided by the number of AMPs, and if just one AMP hits its own maximum the Spool space error occurs).

But this is a data demographics issue.

Normally to diagnose a problem like this and suggest SQL tuning one needs an EXPLAIN of the query and the DDL of the accessed tables.

Thank you

Teradata Employee

Re: Tuning the SQL query

To better understand the question we would need to know the SQL conditions relating the tables. Are these conditions selective or is it expected that the entire 5 billion rows would be included in the result? If the latter, then spool space for two copies of the 5 billion rows will be necessary - one for the join result and one for the final result. And of course those both would have all the columns of the result of the join.

Re: Tuning the SQL query

Hi,

This is the explain plan I got:

1) First, we lock a distinct QA_RMAP_1."pseudo table" for read on a

     RowHash to prevent global deadlock for

     QA_RMAP_1.ENROLL_SNAPSHOT_DATE.

  2) Next, we lock a distinct QA_RMAP_1."pseudo table" for read on a

     RowHash to prevent global deadlock for

     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.

  3) We lock QA_RMAP_1.ENROLL_SNAPSHOT_DATE for read, and we lock

     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP for read.

  4) We create the table header.

  5) We do an all-AMPs RETRIEVE step from

     QA_RMAP_1.ENROLL_SNAPSHOT_DATE by way of an all-rows scan with a

     condition of ("(QA_RMAP_1.ENROLL_SNAPSHOT_DATE.SNPSHT_ID = 2) AND

     (QA_RMAP_1.ENROLL_SNAPSHOT_DATE.SNPSHT_TYP = 1)") into Spool 2

     (all_amps), which is built locally on the AMPs.  The size of Spool

     2 is estimated with high confidence to be 156,627,426 rows (

     5,795,214,762 bytes).  The estimated time for this step is 13.76

     seconds.

  6) We do an all-AMPs JOIN step from

     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP by way of a RowHash

     match scan with no residual conditions, which is joined to Spool 2

     (Last Use) by way of a RowHash match scan.

     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP and Spool 2 are

     joined using a merge join, with a join condition of (

     "(QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.PROMO_SBSC_START_DT

     <= SNPSHT_DT) AND

     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.PROMO_SBSC_END_DT

     > SNPSHT_DT) AND

     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.BILL_START_DT <=

     SNPSHT_DT) AND

     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.BILL_END_DT >

     SNPSHT_DT) AND

     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.CNCT_DT <=

     SNPSHT_DT) AND

     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.DISCNCT_DT >

     SNPSHT_DT) AND

     (QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.CUST_ACCT_KEY =

     CUST_ACCT_KEY ))))))").  The input table

     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP will not be cached

     in memory, but it is eligible for synchronized scanning.  The

     result goes into Spool 1 (all_amps), which is redistributed by

     hash code to all AMPs.  Then we do a SORT to order Spool 1 by row

     hash.  The result spool file will not be cached in memory.  The

     size of Spool 1 is estimated with low confidence to be

     17,497,152,691 rows (23,848,619,117,833 bytes).  The estimated

     time for this step is 10 hours and 17 minutes.

  7) We do an all-AMPs MERGE into

     QA_RMAP_SQLETL.TMP_SUBSCRIPTION_SNAPSHOT_JOIN_INNER_1 from Spool 1

     (Last Use).  The size is estimated with low confidence to be

     17,497,152,691 rows.  The estimated time for this step is 1,336

     hours and 9 minutes.

  8) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> No rows are returned to the user as the result of statement 1.

Re: Tuning the SQL query

Hi Vinay,

1. Check the duplicates for the table 'REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP' on PK columns.

2. Check for the Multi Column Stats on below combination. Always Stats Play a major role.

Run the below query.

COLLECT STATS COLUMN (PROMO_SBSC_START_DT,PROMO_SBSC_END_DT,BILL_START_DT,BILL_END_DT,CNCT_DT,DISCNCT_DT,CUST_ACCT_KEY)

ON QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP ;