Query_Discussion_HELP

Analytics

Query_Discussion_HELP

Hello,

First of all thanks for your time.

I am very new in Teradata and I would like your opinion about this query.. (= execution plan)

The query just joins 10 times my fact table(1 Billion rows).The dimension tables are really small.

So every time I need to access my main enormous one table.Even to make join index for each join,I am out of spool,I increased until 10Gb, ..

1)statistics are ok

2)many secondary indexes

3)no skew

Here is the execution plan.

Thank you very much for your time.

1) First, we lock a distinct sq_mpa."pseudo table" for read on a

     RowHash to prevent global deadlock for sq_mpa.a.

  2) Next, we lock a distinct sq_mpa."pseudo table" for read on a

     RowHash to prevent global deadlock for sq_mpa.d.

  3) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

     prevent global deadlock for sq_mpa.H.

  4) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

     prevent global deadlock for sq_mpa.m.

  5) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

     prevent global deadlock for sq_mpa.f.

  6) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

     prevent global deadlock for sq_mpa.n.

  7) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

     prevent global deadlock for sq_mpa.g.

  8) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

     prevent global deadlock for sq_mpa.e.

  9) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

     prevent global deadlock for sq_mpa.c.

 10) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

     prevent global deadlock for sq_mpa.Y.

 11) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

     prevent global deadlock for sq_mpa.J.

 12) We lock sq_mpa.a for read, we lock sq_mpa.d for read, we lock

     sq_mpa.H for read, we lock sq_mpa.m for read, we lock sq_mpa.f for

     read, we lock sq_mpa.n for read, we lock sq_mpa.g for read, we

     lock sq_mpa.e for read, we lock sq_mpa.c for read, we lock

     sq_mpa.Y for read, and we lock sq_mpa.J for read.

 13) We execute the following steps in parallel.

      1) We do an all-AMPs RETRIEVE step from sq_mpa.g by way of an

         all-rows scan with no residual conditions into Spool 2

         (all_amps), which is duplicated on all AMPs.  The size of

         Spool 2 is estimated with high confidence to be 2,250 rows (

         78,750 bytes).  The estimated time for this step is 0.01

         seconds.

      2) We do an all-AMPs RETRIEVE step from sq_mpa.f by way of an

         all-rows scan with no residual conditions into Spool 3

         (all_amps), which is duplicated on all AMPs.  The size of

         Spool 3 is estimated with high confidence to be 2,850 rows (

         82,650 bytes).  The estimated time for this step is 0.01

         seconds.

      3) We do an all-AMPs RETRIEVE step from sq_mpa.m by way of an

         all-rows scan with no residual conditions into Spool 4

         (all_amps), which is duplicated on all AMPs.  The size of

         Spool 4 is estimated with high confidence to be 7,050 rows (

         204,450 bytes).  The estimated time for this step is 0.01

         seconds.

      4) We do an all-AMPs RETRIEVE step from sq_mpa.d by way of an

         all-rows scan with no residual conditions into Spool 5

         (all_amps), which is duplicated on all AMPs.  The size of

         Spool 5 is estimated with high confidence to be 8,699,700 rows

         (226,192,200 bytes).  The estimated time for this step is 0.15

         seconds.

 14) We do an all-AMPs JOIN step from sq_mpa.H by way of an all-rows

     scan with no residual conditions, which is joined to Spool 4 (Last

     Use) by way of an all-rows scan.  sq_mpa.H and Spool 4 are joined

     using a product join, with a join condition of ("(1=1)").  The

     result goes into Spool 6 (all_amps), which is duplicated on all

     AMPs into 2 hash join partitions.  The size of Spool 6 is

     estimated with high confidence to be 10,152,000 rows (385,776,000

     bytes).  The estimated time for this step is 0.23 seconds.

 15) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an

     all-rows scan, which is joined to sq_mpa.a by way of an all-rows

     scan with no residual conditions.  Spool 5 and sq_mpa.a are joined

     using a single partition hash_ join, with a join condition of (

     "sq_mpa.a.created_by = ROW_ID").  The input table sq_mpa.a will

     not be cached in memory.  The result goes into Spool 7 (all_amps),

     which is built locally on the AMPs into 2 hash join partitions.

     The result spool file will not be cached in memory.  The size of

     Spool 7 is estimated with low confidence to be 200,000,000 rows (

     27,400,000,000 bytes).  The estimated time for this step is 25.62

     seconds.

 16) We do an all-AMPs JOIN step from sq_mpa.c by way of an all-rows

     scan with no residual conditions, which is joined to Spool 2 (Last

     Use) by way of an all-rows scan.  sq_mpa.c and Spool 2 are joined

     using a product join, with a join condition of ("(1=1)").  The

     result goes into Spool 8 (all_amps), which is duplicated on all

     AMPs.  The size of Spool 8 is estimated with high confidence to be

     141,750 rows (7,229,250 bytes).  The estimated time for this step

     is 0.02 seconds.

 17) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an

     all-rows scan, which is joined to Spool 7 (Last Use) by way of an

     all-rows scan.  Spool 6 and Spool 7 are joined using a hash join

     of 2 partitions, with a join condition of ("(created_5 = ore_min)

     AND (sra_resolution_cd_NA = ESITO)").  The result goes into Spool

     9 (all_amps), which is built locally on the AMPs.  The result

     spool file will not be cached in memory.  The size of Spool 9 is

     estimated with low confidence to be 204,347,827 rows (

     26,156,521,856 bytes).  The estimated time for this step is 19.86

     seconds.

 18) We do an all-AMPs RETRIEVE step from sq_mpa.n by way of an

     all-rows scan with no residual conditions into Spool 10 (all_amps),

     which is duplicated on all AMPs.  Then we do a SORT to order Spool

     10 by the hash code of (sq_mpa.n.SOTTOESITO).  The size of Spool

     10 is estimated with high confidence to be 9,150 rows (320,250

     bytes).  The estimated time for this step is 0.00 seconds.

 19) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an

     all-rows scan, which is joined to Spool 9 (Last Use) by way of an

     all-rows scan.  Spool 8 and Spool 9 are joined using a single

     partition hash join, with a join condition of ("(todo_cd = todo_cd)

     AND (X_CARAT_CONT = X_CARAT_CONT)").  The result goes into Spool

     11 (all_amps), which is built locally on the AMPs.  Then we do a

     SORT to order Spool 11 by the hash code of (sq_mpa.a.sra_stat_cd).

     The result spool file will not be cached in memory.  The size of

     Spool 11 is estimated with low confidence to be 315,537,086 rows (

     33,446,931,116 bytes).  The estimated time for this step is 1

     minute and 6 seconds.

 20) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of a

     RowHash match scan, which is joined to Spool 11 (Last Use) by way

     of a RowHash match scan.  Spool 10 and Spool 11 are right outer

     joined using a merge join, with condition(s) used for non-matching

     on right table ("NOT (sra_stat_cd IS NULL)"), with a join

     condition of ("sra_stat_cd = SOTTOESITO").  The result goes into

     Spool 12 (all_amps), which is redistributed by the hash code of (

     sq_mpa.a.TARGET_OU_ID) to all AMPs.  Then we do a SORT to order

     Spool 12 by row hash.  The result spool file will not be cached in

     memory.  The size of Spool 12 is estimated with low confidence to

     be 324,598,171 rows (31,810,620,758 bytes).  The estimated time

     for this step is 13 minutes and 58 seconds.

 21) We execute the following steps in parallel.

      1) We do an all-AMPs JOIN step from sq_mpa.Y by way of a RowHash

         match scan with no residual conditions, which is joined to

         Spool 12 (Last Use) by way of a RowHash match scan.  sq_mpa.Y

         and Spool 12 are right outer joined using a merge join, with

         condition(s) used for non-matching on right table ("NOT

         (TARGET_OU_ID IS NULL)"), with a join condition of (

         "TARGET_OU_ID = sq_mpa.Y.ROW_ID").  The input table sq_mpa.Y

         will not be cached in memory.  The result goes into Spool 15

         (all_amps), which is built locally on the AMPs.  Then we do a

         SORT to order Spool 15 by the hash code of (sq_mpa.a.asset_id).

         The result spool file will not be cached in memory.  The size

         of Spool 15 is estimated with low confidence to be 324,598,171

         rows (33,758,209,784 bytes).  The estimated time for this step

         is 1 minute and 22 seconds.

      2) We do an all-AMPs RETRIEVE step from sq_mpa.J by way of an

         all-rows scan with no residual conditions into Spool 18

         (all_amps), which is duplicated on all AMPs.  Then we do a

         SORT to order Spool 18 by the hash code of (sq_mpa.J.ROW_ID).

         The result spool file will not be cached in memory.  The size

         of Spool 18 is estimated with high confidence to be

         9,329,829,300 rows (307,884,366,900 bytes).  The estimated

         time for this step is 16 minutes and 44 seconds.

      3) We do an all-AMPs JOIN step from sq_mpa.e by way of an

         all-rows scan with no residual conditions, which is joined to

         Spool 3 (Last Use) by way of an all-rows scan.  sq_mpa.e and

         Spool 3 are joined using a product join, with a join condition

         of ("(1=1)").  The result goes into Spool 19 (all_amps), which

         is duplicated on all AMPs.  The size of Spool 19 is estimated

         with high confidence to be 364,800 rows (16,416,000 bytes).

         The estimated time for this step is 0.02 seconds.

 22) We do an all-AMPs JOIN step from Spool 15 (Last Use) by way of a

     RowHash match scan, which is joined to Spool 18 (Last Use) by way

     of a RowHash match scan.  Spool 15 and Spool 18 are left outer

     joined using a merge join, with condition(s) used for non-matching

     on left table ("NOT (asset_id IS NULL)"), with a join condition of

     ("asset_id = ROW_ID").  The result goes into Spool 20 (all_amps),

     which is built locally on the AMPs.  The result spool file will

     not be cached in memory.  The size of Spool 20 is estimated with

     low confidence to be 324,598,171 rows (37,977,986,007 bytes).  The

     estimated time for this step is 1 minute and 4 seconds.

 23) We do an all-AMPs JOIN step from Spool 19 (Last Use) by way of an

     all-rows scan, which is joined to Spool 20 (Last Use) by way of an

     all-rows scan.  Spool 19 and Spool 20 are joined using a single

     partition hash join, with a join condition of ("(X_IN_OUT =

     X_IN_OUT) AND (X_CHANNEL = X_CHANNEL)").  The result goes into

     Spool 1 (all_amps), which is built locally on the AMPs.  The

     result spool file will not be cached in memory.  The size of Spool

     1 is estimated with low confidence to be 504,423,484 rows (

     54,982,159,756 bytes).  The estimated time for this step is 44.89

     seconds.

 24) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 34 minutes and 24

     seconds.

12 REPLIES
N/A

Re: Query_Discussion_HELP

It's hard to tell without addittional information:

- Table DDL, at least Index/Partitioning

- Your actual query

- output of HELP STATS for each table

- size of your sytem, 150 AMPs?

Dieter

Re: Query_Discussion_HELP

Hello,

Thanks for the answer.

 I "split" logical , the tables in two sections.

My small ones,which are about 10.000 rows and my big one,which has more than 1 billion.

It is very simple query,it just joins with id of each table and get the descriptions of all small ones.

There is no aggregation,no group by ,no where clause,no order by.

Partition I did not put,will it help ? ,at least for the column which makes one join?

I have secondary indexes on every column which is joined.

I have join index on ,(column join) , (fields which return).This only to small ones,because on my main table is impossible.

Help stats,I will look it.:) Thanks.But i take statistics on every column of all tables plus on all table.

Yes , size of my system is 150 AMPs .

Questions,

The datafields of columns,of every join must be the same? example, varchar(10) inner join varchar (10)?

My big table is UPI on a column which is not used in the query...will it netter to PI to a column which is used in a join and partition the table to a other  join column?

Ending, I have to check it again,because I was checking one join and then 2,3,4.....Until the half is just some seconds,and then at 5th  join goes to 5 minutes?Is this normal?Or the problem will be somewhere there?

Thanks one more time.

**I know that i have read a lot,but you answers will help  me to focus better on which is most important.

N/A

Re: Query_Discussion_HELP

It is really not clear (at least to me) from your explanation what you are doing and want to achive.

You need to share with us the DDL & SQL - if you have privacy concerns change the column and table names - to get efficient feedback.

Re: Query_Discussion_HELP

Hi,

**************************************************

select 

       a.ROW_ID

      ,J.TELEFONO

      ,a.asset_id

      , ( a.created_date  - cast('2000-01-01' as date format'yyyy-mm-dd') ) +1 as xxxx

      ,c.cod_tipologia as xxxxx

      ,d.id_field as xxxx

      ,e.cod_canale as xxxx

      ,f.cod_provenienza as xxxxxx

      ,g.cod_tipo_contatto as xxxxxx

      ,d.ut as xxxxxx

      ,H.time_id AS xxxxxxxxx

      ,Y.CODICE_FISCALE AS xxxxxxxx

      ,m.ID_TABLE_J as xxxxxxxxx

      ,n.id_sottoesito as xxxxxxx

from  schema_name.TABLE_A       

      inner join schema_name.TABLE_B                         

      on a.X_CARAT_CONT=c.X_CARAT_CONT

     inner join schema_name.TABLE_C                       

      ON a.created_by = d.row_id

      inner join schema_name.TABLE_D e

      on a.X_CHANNEL=e.X_CHANNEL

       inner join schema_name.TABLE_E f

      on a.X_IN_OUT=f.X_IN_OUT

        inner join schema_name.TABLE_F g  

      on a.todo_cd=g.todo_cd

       inner join schema_name.TABLE_G H

     on  a.created_5   = H.ore_min

      LEFT OUTER JOIN schema_name.TABLE_H J  

      ON A.ASSET_ID=J.ROW_ID

      LEFT OUTER JOIN schema_name.TABLE_I Y  

      ON A.TARGET_OU_ID=Y.ROW_ID    

      inner join schema_name.TABLE_J  m      

       on a.sra_resolution_cd_NA = m.esito

        left outer join   schema_name.TABLE_K n  

      on a.sra_stat_cd = n.sottoesito

****************************************************

This is the statement.

All the tables except tableA are very small,almost 1000- 10000 rows.

TableA has more than 1 Billion rows.

As you may see,I join every time the tableA with all the small.

Possible indexes to use?

Possible structure of my big one,meaning tableA.

Suggestions.

Thank you all for the help and your time!

N/A

Re: Query_Discussion_HELP

You actually want to return all 1,000,000,000+ rows (approx. 100GB)?

Forget about any secondary/join indexes.

Your dimension tables are not all small, according to explain:

G 15 rows

F 19

M 47

N 61

C 63

E 128

H 1,440

D 57,988

J 62,198,862

Y "not be cached in memory", so probably quite large, too

The PI of a fact table is usually combined from the FKs to the commonly used dimensions. Thus the optimizer can do a so-called Large Table/Small Table Join = cross joining the dimensions to facilitate a fllowing join to the fact PI, but this is mainly for queries with restrictive conditions on the dimensions.

Dieter

Re: Query_Discussion_HELP

Thanks,

There are two tables (the ones with the left join)

which have almost 70.000.000 rows..

I tried to make Join indexes but I got out of spool,I will ask the dba to increase it,but as you said,it is like impossible.

So,not to put secondary indexes on all columns of the big one?

Plus,add secondary indexes on join tables,at returned column  and the combination,join column and returned.

All the dimensions are UPI on join columns....

We will see....

So,only statistics?

It is possible to join 2 tables or more,to make temp tables and then re-join?Will it help?

Thanks.

N/A

Re: Query_Discussion_HELP

Teradata will not use SIs for joins (unless you apply very restrictive conditions resulting in a small number of rows). In your case any SI on all but the two big dimensions is probably never used for any kind of query.

Join Indexes will not be used in your case, too.

Only a different PI of the fact table might help.

Otherwise you need stats on the FK/PK columns, Foreign Keys ("WITH NO CHECK OPTION") will provide better estimation to the optimizer, too.

And then increase your spool to approx. 400 GB and wait an hour :-)

Dieter

Re: Query_Discussion_HELP

Hi,

As I said you,I am new at Teradata, I am coming from Oracle.

I decided not not do at once the query but to split the data to two tables and then insert as select.

 I've just finished some tutorials and you are right.No need for SI.

Can I ask some more questions?

In Teradata the order of the join matters?Meaning,if I have 3 joins, j1 - j2 -j3.The execution time is diferent of every possible order?

When you say different PI?WIth what reasons will I change it?

It would change the execution time?with different PI?

Should it better try with the columns which join with my 2 most big dimensions?

If I split the 10 joins to half.Meaning,create a table which contains the data from the 5 joins and one other which contain the other 5 ones.And then make one more join with my new 2 tables?Shoulf it be better?

Ending,sorry for all these questions....,if I will make a samle table of 10.000.000 rows instead of the one big?Will i take nearly correctly stats,execution time,way of joins...and so on?

Thank you one more time.

**I Will try FK/PK too...thanks.

Re: Query_Discussion_HELP

Create a  volatile table involving all the joins expect for the huge table and then join it back to the larger table . In this way you would avoid joining to the 1 billion table each time.