I have two tables , table A size 10GB (base table) and table B size 4GB, joined using left outer join, stats are collected on the joining columns, query is getting a spool space of 1.79 TB on Teradata system, any suggestions or points which need to be checked or done for optimizing the query to run in shorter time and taking less spool. Explain plan is duplicating data on all amps.
FROM DB_TMP.TABLE_A A
LEFT JOIN DB_TMP.TABLE_B B
ON A.SBSCRP_KEY = B.SBSCRP_KEY;
Table B has PRIMARY INDEX ( SBSCRP_KEY ) while Table A has PRIMARY INDEX ( SBSCRP_KEY )
Partition by ( RANGE_N(SBSCRP_DATE BETWEEN DATE '2000-01-01' AND DATE '2015-12-31' EACH INTERVAL '1' DAY ));
Table A has below data
1 1/25/2014 76831239
2 1/26/2014 76821482
3 1/27/2014 76291611
4 1/28/2014 76846372
5 1/29/2014 70884575
6 1/30/2014 76400378
7 1/31/2014 71271684
525 mio rows can't be stored in 10GB :-)
When both tables share the same PI there should be no duplication. I would expect the big table's spool is "built locally" and then sorted to prepare for the join (worst case), or no table is spooled and there's a direct "sliding window join".
The query you posted is probably not the actual query, you're not selecting any column from the outer table...
Can you show the stats for both tables and the actual explain, too?