Analytics
Highlighted
Enthusiast

## Query_Discussion_HELP

Hello,

First of all thanks for your time.

The query just joins 10 times my fact table(1 Billion rows).The dimension tables are really small.

So every time I need to access my main enormous one table.Even to make join index for each join,I am out of spool,I increased until 10Gb, ..

1)statistics are ok

2)many secondary indexes

3)no skew

Here is the execution plan.

Thank you very much for your time.

1) First, we lock a distinct sq_mpa."pseudo table" for read on a

RowHash to prevent global deadlock for sq_mpa.a.

2) Next, we lock a distinct sq_mpa."pseudo table" for read on a

RowHash to prevent global deadlock for sq_mpa.d.

3) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

4) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

5) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

6) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

7) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

8) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

9) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

10) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

11) We lock a distinct sq_mpa."pseudo table" for read on a RowHash to

12) We lock sq_mpa.a for read, we lock sq_mpa.d for read, we lock

sq_mpa.H for read, we lock sq_mpa.m for read, we lock sq_mpa.f for

13) We execute the following steps in parallel.

1) We do an all-AMPs RETRIEVE step from sq_mpa.g by way of an

all-rows scan with no residual conditions into Spool 2

(all_amps), which is duplicated on all AMPs.  The size of

Spool 2 is estimated with high confidence to be 2,250 rows (

78,750 bytes).  The estimated time for this step is 0.01

seconds.

2) We do an all-AMPs RETRIEVE step from sq_mpa.f by way of an

all-rows scan with no residual conditions into Spool 3

(all_amps), which is duplicated on all AMPs.  The size of

Spool 3 is estimated with high confidence to be 2,850 rows (

82,650 bytes).  The estimated time for this step is 0.01

seconds.

3) We do an all-AMPs RETRIEVE step from sq_mpa.m by way of an

all-rows scan with no residual conditions into Spool 4

(all_amps), which is duplicated on all AMPs.  The size of

Spool 4 is estimated with high confidence to be 7,050 rows (

204,450 bytes).  The estimated time for this step is 0.01

seconds.

4) We do an all-AMPs RETRIEVE step from sq_mpa.d by way of an

all-rows scan with no residual conditions into Spool 5

(all_amps), which is duplicated on all AMPs.  The size of

Spool 5 is estimated with high confidence to be 8,699,700 rows

(226,192,200 bytes).  The estimated time for this step is 0.15

seconds.

14) We do an all-AMPs JOIN step from sq_mpa.H by way of an all-rows

scan with no residual conditions, which is joined to Spool 4 (Last

Use) by way of an all-rows scan.  sq_mpa.H and Spool 4 are joined

using a product join, with a join condition of ("(1=1)").  The

result goes into Spool 6 (all_amps), which is duplicated on all

AMPs into 2 hash join partitions.  The size of Spool 6 is

estimated with high confidence to be 10,152,000 rows (385,776,000

bytes).  The estimated time for this step is 0.23 seconds.

15) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an

all-rows scan, which is joined to sq_mpa.a by way of an all-rows

scan with no residual conditions.  Spool 5 and sq_mpa.a are joined

using a single partition hash_ join, with a join condition of (

"sq_mpa.a.created_by = ROW_ID").  The input table sq_mpa.a will

not be cached in memory.  The result goes into Spool 7 (all_amps),

which is built locally on the AMPs into 2 hash join partitions.

The result spool file will not be cached in memory.  The size of

Spool 7 is estimated with low confidence to be 200,000,000 rows (

27,400,000,000 bytes).  The estimated time for this step is 25.62

seconds.

16) We do an all-AMPs JOIN step from sq_mpa.c by way of an all-rows

scan with no residual conditions, which is joined to Spool 2 (Last

Use) by way of an all-rows scan.  sq_mpa.c and Spool 2 are joined

using a product join, with a join condition of ("(1=1)").  The

result goes into Spool 8 (all_amps), which is duplicated on all

AMPs.  The size of Spool 8 is estimated with high confidence to be

141,750 rows (7,229,250 bytes).  The estimated time for this step

is 0.02 seconds.

17) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an

all-rows scan, which is joined to Spool 7 (Last Use) by way of an

all-rows scan.  Spool 6 and Spool 7 are joined using a hash join

of 2 partitions, with a join condition of ("(created_5 = ore_min)

AND (sra_resolution_cd_NA = ESITO)").  The result goes into Spool

9 (all_amps), which is built locally on the AMPs.  The result

spool file will not be cached in memory.  The size of Spool 9 is

estimated with low confidence to be 204,347,827 rows (

26,156,521,856 bytes).  The estimated time for this step is 19.86

seconds.

18) We do an all-AMPs RETRIEVE step from sq_mpa.n by way of an

all-rows scan with no residual conditions into Spool 10 (all_amps),

which is duplicated on all AMPs.  Then we do a SORT to order Spool

10 by the hash code of (sq_mpa.n.SOTTOESITO).  The size of Spool

10 is estimated with high confidence to be 9,150 rows (320,250

bytes).  The estimated time for this step is 0.00 seconds.

19) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an

all-rows scan, which is joined to Spool 9 (Last Use) by way of an

all-rows scan.  Spool 8 and Spool 9 are joined using a single

partition hash join, with a join condition of ("(todo_cd = todo_cd)

AND (X_CARAT_CONT = X_CARAT_CONT)").  The result goes into Spool

11 (all_amps), which is built locally on the AMPs.  Then we do a

SORT to order Spool 11 by the hash code of (sq_mpa.a.sra_stat_cd).

The result spool file will not be cached in memory.  The size of

Spool 11 is estimated with low confidence to be 315,537,086 rows (

33,446,931,116 bytes).  The estimated time for this step is 1

minute and 6 seconds.

20) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of a

RowHash match scan, which is joined to Spool 11 (Last Use) by way

of a RowHash match scan.  Spool 10 and Spool 11 are right outer

joined using a merge join, with condition(s) used for non-matching

on right table ("NOT (sra_stat_cd IS NULL)"), with a join

condition of ("sra_stat_cd = SOTTOESITO").  The result goes into

Spool 12 (all_amps), which is redistributed by the hash code of (

sq_mpa.a.TARGET_OU_ID) to all AMPs.  Then we do a SORT to order

Spool 12 by row hash.  The result spool file will not be cached in

memory.  The size of Spool 12 is estimated with low confidence to

be 324,598,171 rows (31,810,620,758 bytes).  The estimated time

for this step is 13 minutes and 58 seconds.

21) We execute the following steps in parallel.

1) We do an all-AMPs JOIN step from sq_mpa.Y by way of a RowHash

match scan with no residual conditions, which is joined to

Spool 12 (Last Use) by way of a RowHash match scan.  sq_mpa.Y

and Spool 12 are right outer joined using a merge join, with

condition(s) used for non-matching on right table ("NOT

(TARGET_OU_ID IS NULL)"), with a join condition of (

"TARGET_OU_ID = sq_mpa.Y.ROW_ID").  The input table sq_mpa.Y

will not be cached in memory.  The result goes into Spool 15

(all_amps), which is built locally on the AMPs.  Then we do a

SORT to order Spool 15 by the hash code of (sq_mpa.a.asset_id).

The result spool file will not be cached in memory.  The size

of Spool 15 is estimated with low confidence to be 324,598,171

rows (33,758,209,784 bytes).  The estimated time for this step

is 1 minute and 22 seconds.

2) We do an all-AMPs RETRIEVE step from sq_mpa.J by way of an

all-rows scan with no residual conditions into Spool 18

(all_amps), which is duplicated on all AMPs.  Then we do a

SORT to order Spool 18 by the hash code of (sq_mpa.J.ROW_ID).

The result spool file will not be cached in memory.  The size

of Spool 18 is estimated with high confidence to be

9,329,829,300 rows (307,884,366,900 bytes).  The estimated

time for this step is 16 minutes and 44 seconds.

3) We do an all-AMPs JOIN step from sq_mpa.e by way of an

all-rows scan with no residual conditions, which is joined to

Spool 3 (Last Use) by way of an all-rows scan.  sq_mpa.e and

Spool 3 are joined using a product join, with a join condition

of ("(1=1)").  The result goes into Spool 19 (all_amps), which

is duplicated on all AMPs.  The size of Spool 19 is estimated

with high confidence to be 364,800 rows (16,416,000 bytes).

The estimated time for this step is 0.02 seconds.

22) We do an all-AMPs JOIN step from Spool 15 (Last Use) by way of a

RowHash match scan, which is joined to Spool 18 (Last Use) by way

of a RowHash match scan.  Spool 15 and Spool 18 are left outer

joined using a merge join, with condition(s) used for non-matching

on left table ("NOT (asset_id IS NULL)"), with a join condition of

("asset_id = ROW_ID").  The result goes into Spool 20 (all_amps),

which is built locally on the AMPs.  The result spool file will

not be cached in memory.  The size of Spool 20 is estimated with

low confidence to be 324,598,171 rows (37,977,986,007 bytes).  The

estimated time for this step is 1 minute and 4 seconds.

23) We do an all-AMPs JOIN step from Spool 19 (Last Use) by way of an

all-rows scan, which is joined to Spool 20 (Last Use) by way of an

all-rows scan.  Spool 19 and Spool 20 are joined using a single

partition hash join, with a join condition of ("(X_IN_OUT =

X_IN_OUT) AND (X_CHANNEL = X_CHANNEL)").  The result goes into

Spool 1 (all_amps), which is built locally on the AMPs.  The

result spool file will not be cached in memory.  The size of Spool

1 is estimated with low confidence to be 504,423,484 rows (

54,982,159,756 bytes).  The estimated time for this step is 44.89

seconds.

24) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 1 are sent back to the user as the result of

statement 1.  The total estimated time is 34 minutes and 24

seconds.

12 REPLIES 12

## Re: Query_Discussion_HELP

It's hard to tell without addittional information:

- Table DDL, at least Index/Partitioning

- output of HELP STATS for each table

- size of your sytem, 150 AMPs?

Dieter

Enthusiast

## Re: Query_Discussion_HELP

Hello,

I "split" logical , the tables in two sections.

My small ones,which are about 10.000 rows and my big one,which has more than 1 billion.

It is very simple query,it just joins with id of each table and get the descriptions of all small ones.

There is no aggregation,no group by ,no where clause,no order by.

Partition I did not put,will it help ? ,at least for the column which makes one join?

I have secondary indexes on every column which is joined.

I have join index on ,(column join) , (fields which return).This only to small ones,because on my main table is impossible.

Help stats,I will look it.:) Thanks.But i take statistics on every column of all tables plus on all table.

Yes , size of my system is 150 AMPs .

Questions,

The datafields of columns,of every join must be the same? example, varchar(10) inner join varchar (10)?

My big table is UPI on a column which is not used in the query...will it netter to PI to a column which is used in a join and partition the table to a other  join column?

Ending, I have to check it again,because I was checking one join and then 2,3,4.....Until the half is just some seconds,and then at 5th  join goes to 5 minutes?Is this normal?Or the problem will be somewhere there?

Thanks one more time.

**I know that i have read a lot,but you answers will help  me to focus better on which is most important.

Senior Supporter

## Re: Query_Discussion_HELP

It is really not clear (at least to me) from your explanation what you are doing and want to achive.

You need to share with us the DDL & SQL - if you have privacy concerns change the column and table names - to get efficient feedback.

Enthusiast

## Re: Query_Discussion_HELP

Hi,

**************************************************

select

a.ROW_ID

,J.TELEFONO

,a.asset_id

, ( a.created_date  - cast('2000-01-01' as date format'yyyy-mm-dd') ) +1 as xxxx

,c.cod_tipologia as xxxxx

,d.id_field as xxxx

,e.cod_canale as xxxx

,f.cod_provenienza as xxxxxx

,g.cod_tipo_contatto as xxxxxx

,d.ut as xxxxxx

,H.time_id AS xxxxxxxxx

,Y.CODICE_FISCALE AS xxxxxxxx

,m.ID_TABLE_J as xxxxxxxxx

,n.id_sottoesito as xxxxxxx

from  schema_name.TABLE_A

inner join schema_name.TABLE_B

on a.X_CARAT_CONT=c.X_CARAT_CONT

inner join schema_name.TABLE_C

ON a.created_by = d.row_id

inner join schema_name.TABLE_D e

on a.X_CHANNEL=e.X_CHANNEL

inner join schema_name.TABLE_E f

on a.X_IN_OUT=f.X_IN_OUT

inner join schema_name.TABLE_F g

on a.todo_cd=g.todo_cd

inner join schema_name.TABLE_G H

on  a.created_5   = H.ore_min

LEFT OUTER JOIN schema_name.TABLE_H J

ON A.ASSET_ID=J.ROW_ID

LEFT OUTER JOIN schema_name.TABLE_I Y

ON A.TARGET_OU_ID=Y.ROW_ID

inner join schema_name.TABLE_J  m

on a.sra_resolution_cd_NA = m.esito

left outer join   schema_name.TABLE_K n

on a.sra_stat_cd = n.sottoesito

****************************************************

This is the statement.

All the tables except tableA are very small,almost 1000- 10000 rows.

TableA has more than 1 Billion rows.

As you may see,I join every time the tableA with all the small.

Possible indexes to use?

Possible structure of my big one,meaning tableA.

Suggestions.

Thank you all for the help and your time!

## Re: Query_Discussion_HELP

You actually want to return all 1,000,000,000+ rows (approx. 100GB)?

Your dimension tables are not all small, according to explain:

G 15 rows

F 19

M 47

N 61

C 63

E 128

H 1,440

D 57,988

J 62,198,862

Y "not be cached in memory", so probably quite large, too

The PI of a fact table is usually combined from the FKs to the commonly used dimensions. Thus the optimizer can do a so-called Large Table/Small Table Join = cross joining the dimensions to facilitate a fllowing join to the fact PI, but this is mainly for queries with restrictive conditions on the dimensions.

Dieter

Enthusiast

## Re: Query_Discussion_HELP

Thanks,

There are two tables (the ones with the left join)

which have almost 70.000.000 rows..

I tried to make Join indexes but I got out of spool,I will ask the dba to increase it,but as you said,it is like impossible.

So,not to put secondary indexes on all columns of the big one?

Plus,add secondary indexes on join tables,at returned column  and the combination,join column and returned.

All the dimensions are UPI on join columns....

We will see....

So,only statistics?

It is possible to join 2 tables or more,to make temp tables and then re-join?Will it help?

Thanks.

## Re: Query_Discussion_HELP

Teradata will not use SIs for joins (unless you apply very restrictive conditions resulting in a small number of rows). In your case any SI on all but the two big dimensions is probably never used for any kind of query.

Join Indexes will not be used in your case, too.

Only a different PI of the fact table might help.

Otherwise you need stats on the FK/PK columns, Foreign Keys ("WITH NO CHECK OPTION") will provide better estimation to the optimizer, too.

And then increase your spool to approx. 400 GB and wait an hour :-)

Dieter

Enthusiast

## Re: Query_Discussion_HELP

Hi,

As I said you,I am new at Teradata, I am coming from Oracle.

I decided not not do at once the query but to split the data to two tables and then insert as select.

I've just finished some tutorials and you are right.No need for SI.

Can I ask some more questions?

In Teradata the order of the join matters?Meaning,if I have 3 joins, j1 - j2 -j3.The execution time is diferent of every possible order?

When you say different PI?WIth what reasons will I change it?

It would change the execution time?with different PI?

Should it better try with the columns which join with my 2 most big dimensions?

If I split the 10 joins to half.Meaning,create a table which contains the data from the 5 joins and one other which contain the other 5 ones.And then make one more join with my new 2 tables?Shoulf it be better?

Ending,sorry for all these questions....,if I will make a samle table of 10.000.000 rows instead of the one big?Will i take nearly correctly stats,execution time,way of joins...and so on?

Thank you one more time.

**I Will try FK/PK too...thanks.

Fan

## Re: Query_Discussion_HELP

Create a  volatile table involving all the joins expect for the huge table and then join it back to the larger table . In this way you would avoid joining to the 1 billion table each time.