Tools
N/A

## Understanding Explain Plan

Hi,

I am new to teradata , trying to understand how does teradata does joins internally.

I am going through the teradata documentation and i have the below question.

table structures used in query:
CREATE SET TABLE TFACT.Employee ,FALLBACK
(Employee_Number INTEGER NOT NULL,
Location_Number INTEGER,
Dept_Number INTEGER,
Emp_Mgr_Number INTEGER,
Job_Code INTEGER,
Last_Name CHAR(20),
First_Name VARCHAR(20),
Salary_Amount DECIMAL(10,2))
UNIQUE PRIMARY INDEX ( Employee_Number )
INDEX ( Job_Code )
INDEX ( Dept_Number );

CREATE SET TABLE TFACT.Department ,FALLBACK
( Dept_Number INTEGER NOT NULL,
Dept_Name CHAR(20 NOT NULL,
Dept_Mgr_Number INTEGER,
Budget_Amount DECIMAL(10,2))
UNIQUE PRIMARY INDEX ( Dept_Number );

And I have run the explain plan

EXPLAIN SELECT Dept_Name, Last_Name, First_Name
FROM Employee E INNER JOIN Department D
ON E.Dept_Number = D.Dept_Number
UNION ALL
SELECT Dept_Name, Last_Name, First_Name
FROM Employee E INNER JOIN Department D
ON E.Employee_Number = D.Dept_Mgr_Number
ORDER BY 1, 2, 3;

Output is:

3) We lock TFACT.E for read, and we lock TFACT.D for read.
4) We do an all-AMPs RETRIEVE step from TFACT.E by way of an all-rows scan with a condition of ("NOT
(TFACT.E.Dept_Number IS NULL)") into Spool 2 (all_amps), which is redistributed by hash code to
all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high
confidence to be 25,996 rows. The estimated time for this step is 0.65 seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from TFACT.D by way of a RowHash match scan with no residual
conditions, which is joined to Spool 2 (Last Use). TFACT.D and Spool 2 are joined using a merge
join, with a join condition of ("Dept_Number = TFACT.D.Dept_Number"). The result goes into
Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached
in memory. The size of Spool 1 is estimated with low confidence to be 25,978 rows. The
estimated time for this step is 0.44 seconds.
2) We do an all-AMPs RETRIEVE step from TFACT.D by way of an all-rows scan with a condition of
("NOT (TFACT.D.Dept_Mgr_Number IS NULL)") into Spool 3 (all_amps), which is redistributed by
hash code to all AMPs. Then we do a SORT to order Spool 3 by row hash. The size of Spool 3 is
estimated with no confidence to be 1,262 rows. The estimated time for this step is 0.05 seconds.
6) We do an all-AMPs JOIN step from TFACT.E by way of a RowHash match scan with no residual
conditions, which is joined to Spool 3 (Last Use). TFACT.E and Spool 3 are joined using a merge join,
with a join condition of ("TFACT.E.Employee_Number = Dept_Mgr_Number"). The input table TFACT.E
will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool
1 (group_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort
key in spool field1. The size of Spool 1 is estimated with no confidence to be 27,240 rows. The
estimated time for this step is 0.19 seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
-> The contents of S

In the Explain plan the redistribution of Employee table based on row hash is done first and then the steps of Joining the spool ( redistributed employee table) and dept table and redistribution of dept table happened in parallel.

Why is it NOT employee table and dept table redistribution happen in parallel.

Appreciate your quick response.

Thank you all
Tags (1)