Hash joins and Nested Joins

Database

Hash joins and Nested Joins

What are Hash joins & Nested(both local and remote)....how they are executed in Teradata ?
Please explain with some examples
5 REPLIES

Re: Hash joins and Nested Joins

create table table_1
(
NUPI INTEGER,
y_1 char(6)
)
primary index (NUPI)
insert into table_1 values(1,'prakhar');
insert into table_1 values(1,'suresh');

create table table_2
(
NUPI INTEGER,
USI_2 char(6)
)
primary index (NUPI)
unique index(USI_2 )

insert into table_2 values(1,'prakhar');
insert into table_2 values(1,'suresh');

Ans I am running below query:
sel * from table_1,table_2
where y_1=USI_2

Explain plan is given below(concrete steps only)
1) We do an all-AMPs RETRIEVE step from table_1 by way of
an all-rows scan with a condition of ("NOT (table_1.y_1
IS NULL)") into Spool 2 (all_amps), which is redistributed by hash
code to all AMPs. Then we do a SORT to order Spool 2 by row hash.
The size of Spool 2 is estimated with no confidence to be 306 rows.
The estimated time for this step is 0.00 seconds.
2) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to table_2 by way of a
traversal of index # 4 without accessing the base table extracting
row ids only. Spool 2 and table_2 are joined using a
nested join, with a join condition of ("y_1 =
table_2.USI_2"). The result goes into Spool 3
(all_amps), which is redistributed by hash code to all AMPs. Then
we do a SORT to order Spool 3 by field Id 1. The size of Spool 3
is estimated with no confidence to be 104,040 rows. The estimated
time for this step is 0.05 seconds.
3) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an
all-rows scan, which is joined to table_2 by way of an
all-rows scan with no residual conditions. Spool 3 and
table_2 are joined using a row id join, with a join
condition of ("y_1 = table_2.USI_2"). The result goes
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with no confidence to be 104,040
rows. The estimated time for this step is 0.07 seconds.

I am not able to get step 2....why redistributed is needed after the join as all rows are already on desired AMP's.....??

Re: Hash joins and Nested Joins

ok you posted here too..please see my reply in your original thread :

http://www.teradata.com/teradataforum/Topic14739-10-1.aspx?Update=1

As for Hash joins..they are part of the merge join family. It occurs when 1 or both tables on each Amp can fit completely inside an Amp's memory.

The hash join uses memory cache for the smaller table and joins it by hash to an unsorted spool(the other table is copied to this spool in an earlier step).Its generally faster than a merge join because it does not need to sort the larger table.

Re: Hash joins and Nested Joins

Here is another wild guess on this!!!!
Please correct me if it's wrong
Following steps are executed commonly

1 Hash-redistributed based on the joined field.
2 Nested Joined with the right table.--->gives rowid of base table+left col
values
3 The resulting rows are redistributed by row hashing the rowID of the
right table rows.---->rowid contains hash of base table so left col values goes
where base table rows are located
4 The rowID is used to retrieve the data rows from the right table to
complete the join.------>only rowid needed to extract rows as left col values
are already there ,right table rows extracted with help of ROWID (ROWID
JOIN) and checked for join values
5 End of process.

Re: Hash joins and Nested Joins

I dont get it...

2 Nested Joined with the right table.--->gives rowid of base table+left col
values


How does the Nested Join with the right table give rowid ? Which is the base table here - table_1 or table_2 ?

Re: Hash joins and Nested Joins

Hi ,

Pls inform me which book is most preferable for knowing scripts and environment in teradata.

Thaks&Regards,
Narasimha