Join Index with derived tables or subquery


Join Index with derived tables or subquery

I am new to teradata .I am trying to improve the performance of one long running query . The query has derived table but it  is not allowing me to create JI . It seems that if query have derived table /sub query/analytical function like Qualify , teradata doesn't allow to create JI . I am not sure whether am I doing any thing wrong here . Is there any limitations for Join Index.


Re: Join Index with derived tables or subquery

Hello there,

It is not always true that the solution to improve the performance of your long running query is create a JI. If you think you have evaluated all options of tuning and only left with JI then definitely the community can help you. Did you try the statistics, query rewrite for some bad coding practice and possible Join indexes? Coming to your question - you cannot have a derived table, subquery or a QUALIFY.

if you need to filter have it make a sparse index and not sure for it being a covered query have the row ID reference in it or make it a global join index. Send the syntax may be someone of us can help you.

Re: Join Index with derived tables or subquery

Thanks vasu.

I have two scenario.I tried many things with my DA and  thought Join Index can help.

1. I have one big table it may have 100 million records and have 3  medium size table that  have 1 million records.

My first table is partitioned and have PI on Acct_id,security_id . And my 3 other tables have PI on security.When I do left join with these three tables , distribution is happening and hence I chose JI for this case.

2.  SEL   ACCT_ID, client_id,SNPS_DT

    , SUM(AMT))



    AND client_id ='C01'


    GROUP BY 1, 2,3;

The table MAIN_TABLE is partitioned by client_id first  and SNPS_DT then. The table has 100 million of records.It seems that Qualify in above scenario is considerably taking sometime.

Re: Join Index with derived tables or subquery

This will first need to do the aggregation for your GROUP BY clause. That process will not be very efficient because it will need to do a global aggregation (vs. local) because your GROUP BY columns do not include all of your primary index columns. To make that more efficient, you could change the primary index to "Acct_Id", "Acct_Id, Client_Id", or "Acct_Id, Client_Id, Snps_Dt". I don't know what that does to the rest of your queries or your ETL process, so you'll have to consider that as well.

The QUALIFY will then take the rows after the aggregation and will redistribute them so that all rows with the same Acct_Id are on the same AMP. Then, it will be able to select the row(s) with the minimum SNPS_Dt.

If you make the primary index the Acct_Id only, the optimizer might be smart enough to keep the data local for both the aggregation (GROUP BY) and the all-AMP STAT function (for the QUALIFY). But, you have to look at the PI design within the context of all of the different operations that you're going to perform on the table.

Re: Join Index with derived tables or subquery

I am trying to join two tables . One have 30 K records and other have 300K records .Both have same PI. And we are doing join on PI columns .It is going for RowHash match  scan. But  still  the performance is not good . Something can help to improve join process

Here is the explain plan .And we collected stats on Join (pi) columns  and CUR_RCD_IN.

  6) We do an all-AMPs JOIN step from testbox.TABLE1 by way of a RowHash

     match scan with a condition of ("testbox.TABLE1.CUR_RCD_IN = 'Y'"),

     which is joined to Spool 6 (Last Use) by way of a RowHash match

     scan.  testbox.TABLE1 and Spool 6 are left outer joined using a

     merge join, with a join condition of ("testbox.TABLE1.SECR_ID =

     SECR_ID").  The result goes into Spool 5 (group_amps), which is

     built locally on the AMPs.  The size of Spool 5 is estimated with

     low confidence to be 181,345,617 rows (73,263,629,268 bytes).  The

     estimated time for this step is 1 minute and 56 seconds.


Re: Join Index with derived tables or subquery

If you got 30k and 300k records with stats and the optimizer assumes 180m rows in spool, this looks like a bad join condition, not one-to-many, but many-to-many.

Did you check the actual runtime/number of rows from DBQL for this step?


Re: Join Index with derived tables or subquery

actual run time is 25 secs


Re: Join Index with derived tables or subquery

And how many rows are returned by that step?

Could you post the collected stats?


Re: Join Index with derived tables or subquery

Rows returned by the  step is 400K .

TABLE1 ( app 30k)

Date     Time     Unique Values        Column Names

-------- -------- -------------------- ------------------------------------

11/12/25 19:11:22            6,404,033 SECR_ID

11/12/25 18:06:16                  179 MNR_PROD_CD

11/12/25 17:45:22                  235 NSRT_RUN_ID

11/12/25 18:36:03                  235 UPDT_RUN_ID

11/12/25 17:11:48                    2 CUR_RCD_IN

11/12/25 17:46:23                   14 PROD_TY_CD

11/12/25 17:07:14                  181 ASST_TY_CD,ASST_STYP_CD,ASST_SUB_STY

11/12/25 17:50:10                    1 PARTITION

TABLE2 (app 400K)

Date     Time     Unique Values        Column Names

-------- -------- -------------------- ------------------------------------

11/12/25 18:38:26              820,489 SECR_ID

11/12/25 18:11:57                  269 NSRT_RUN_ID

11/12/25 17:41:28                  268 UPDT_RUN_ID

11/12/25 18:07:50                    2 CUR_RCD_IN


Re: Join Index with derived tables or subquery

TABLE1 ( app 30k) <--> 6,404,033 Unique Values

TABLE2 (app 400K) <--> 820,489 Unique Values

Either your numbers or the stats are totally wrong.

Is TABLE1 partitioned? There's only a single partition with data.

If both table are joined via the PI i would expect a direct join without intermediate spool.

Is TABLE2 SECR_ID unique?

Could you share the DDL and the actual query?