I am getting Spool space error when this query run in PROD. Any optimization tips welcome. I have put the records in Millions and Indexes on comments.
INSERT INTO extrt_ent_drcty_data_3_gt(28 Cols)
FROM extrt_ent_drcty_data_2_gt d /* 15Million rows - acct_id,srv_accs_id(Primary) */
Left join ( SELECT a1.acct_id, a1.acct_nm_addr_eff_dt, a1.acct_nm_addr_end_dt,ea.elec_addr_txt FROM acct_nm_addr_hist a1 /* 540M - acct_id,addr_id,prty_nm_id(Primary) */
join elec_addr ea /* 33M - addr_id(Unique PI) */ on a1.addr_id = ea.addr_id AND a1.nm_addr_rol_cd = 'PE' AND a1.acct_nm_addr_end_dt = '9999-12-31') pe
ON d.acct_id = pe.acct_id
Left join ( SELECT a2.cols,la.cols,nm.cols
FROM acct_nm_addr_hist a2 /* 540M - acct_id,addr_id,prty_nm_id(Primary) */ join loc_addr la /* 317M - addr_id (UPI) */ on a2.addr_id = la.addr_id
Left join prty_nm nm /* 218M - prty_nm_id (UPI) */
on a2.prty_nm_id = nm.prty_nm_id WHERE a2.nm_addr_rol_cd = 'BN' AND a2.acct_nm_addr_end_dt = '9999-12-31' AND a2.acct_nm_addr_eff_dt = ( SELECT max(anah_mx.acct_nm_addr_eff_dt) FROM acct_nm_addr_hist anah_mx /* 540M - acct_id,addr_id,prty_nm_id (Primary) */ WHERE a2.acct_id = anah_mx.acct_id AND anah_mx.nm_addr_rol_cd = 'BN' AND anah_mx.acct_nm_addr_end_dt = '9999-12-31')) bn ON d.acct_id = bn.acct_id AND bn.acct_nm_addr_end_dt = '9999-12-31';
do an explain on this query with diagnostic helpstats on for session; This will show if there are any statistics that should be collected. collect the statistics and then run your query. that will help.
I just ran into a similar problem the other day. It had to do with the fact that Teradata's plan was left outer joining to my dimension table on the step where I still had millions of rows, rather than at the last step where I only had 200,000 rows.
You can force it to left outer join to the dimension tables in the correct order with inline views.
select out1.col1, out1.col2, d.col3 from (select col1, col2 from inner_query) out1 left outer join dim_table d on out1.col1 = d.col1
If you do this, it fully renders the inner query first before attempting to hash or merge join millions of records in your spool area to a dimesion table, where it really only needs to look up dimension values to go with your final results.
Hopefully I explained the concept well enough. Basically, take all of the heavy processing and make it an inline view, and the outer join to your dimension tables in a second step. The only way this becomes impossible is if you need one of the left outer join values to group by in the inner view.
I often used to got spool space issues, I realized that I was not joining the columns that are of same data types (most of the cases are like joining a decimal data type to integer data type etc., so I casted the joining column to match the other column's data type, it used to resolve the issue), which used to pull me in to Spool space issues. I suggest check those kind of issues. Some one told me that when we filter the subqueries just to have the rows that joins, the subquery table's Index will be lost for query computing??? I am not sure).