I have the following JOIN query which is part of a sub-query for the CREATE TABLE statement.
Select cast(PERIOD_END_DT_TM As date format 'yyyy-mm-dd') end_dte From audit.PROCESS_CONTROL Where process_name = 'CHURN_MONTHLY_SNAPSHOT') pc2, vantage.SUBSCRIBER_WORKORDER_XREF x Inner Join vantage.WOS_SERV_CODE wos On x.SID_WOC = wos.SID_WO And wos.serv_ext_aft_wos > 0 Left Outer Join reportuser.ss_programming ssp On wos.serv_cde_wos = ssp.serv_cde Where x.ACTIVE_FLAG = 'Y' And pc2.end_dte Between x.BEGIN_DATE And x.END_DATE Group By x.sid_sbb, x.SID_WOC, mon_snap_dte, x.ORIG_NC_WOC_DATE ) sub Inner Join vantage.WOS_SERV_CODE wos On sub.SID_WOC = wos.SID_WO And wos.serv_ext_aft_wos > 0 --Left Outer Join vantage.HOUSE_SUBSCRIBER_XREF hsx -- On sub.sid_sbb = hsx.sid_sbb -- And sub.mon_snap_dte Between hsx.BEGIN_DATE -- And hsx.END_DATE Left Outer Join (Select sid_sbb, max (sid_hse) sid_hse from vantage.HSE_BASE group by sid_sbb) hsx On sub.sid_sbb = hsx.sid_sbb Left Outer Join vantage.HSE_BASE_GEO_ADDR geo On hsx.SID_HSE = geo.SID_HSE Left Outer Join vantage.ALA_SERV_CODE ala On wos.serv_cde_wos = ala.SERV_CDE_ALA And wos.SYS_WOS = ala.SYS_ALA And wos.PRIN_WOS = ala.PRIN_ALA And wos.AGNT_WOS= ala.AGNT_ALA Left Outer Join stage.san_network_service_codes san On wos.SERV_CDE_WOS = san.service_code And san.network_type = 'NETWORK' Left Outer Join reportuser.ss_programming ssp On wos.serv_cde_wos = ssp.serv_cde Group By sub.sid_sbb, sub.SID_WOC, hsx.sid_hse, sub.mon_snap_dte, sub.ORIG_NC_WOC_DATE, mdu_code, postal_rec_type, sub.basic_pkg_rank )
Can someone please help me out how to create a JOIN INDEX on this as suggested earlier?
These are the 3 steps where it is taking more than estimated time.
We do an All-AMPs RETRIEVE step from VIRTUALVANTAGE.WOS_SERV_CODE_TAB by way of an all-rows scan into Spool 6147, which is redistributed by hash code to all AMPs. This step is performed in parallel.
We do an All-AMPs JOIN step from Spool 6147 (Last Use) by way of an all-rows scan, which is joined to Spool 6148. Spool 6147 and Spool 6148 are left outer joined using a merge join . The result goes into Spool 6150, which is built locally on the AMPs. This step begins a parallel block of steps.
We do an All-AMPs JOIN step from Spool 6149 (Last Use) by way of an all-rows scan, which is joined to Spool 6150. Spool 6149 and Spool 6150 are right outer joined using a merge join . The result goes into Spool 6152, which is built locally on the AMPs.
The Estimated I/O 1086025000 and Actual I/O 3126072000 on all the 3 steps.
This is where I was thinking the JOIN INDEX might help.
Can someone please suggest me where to go from here or direct me to some sort of URL where I can study to work upon this?
Hi , Please refer Teradata manuals (Data Definition Statements,Data Manipulation Statements, SQL fundamentals,Database Design) and learn about various methods of tunig the SQL in teradata. If you are a newbie to Teradata, I feel you are trying to achieve some thing which is out of scope of your experience.
I would like to inform you the following about your question:
1) Try to estimate the time taken seperately for executing the Select statement and Creating the table. If your select is not taking much time ,but creating the table is taking time,analyze the DDL and choose the proper primary index for even distribution.
2) Another method is to create the table first and loading this table in the following way; Divide your process into smaller steps. Create global Temporary tables and load the data into these tables with smaller queries and finally load into your main table joining these temporary table.
Please note that it is very important to collect statistics on the primary indices of the tables involved and the columns getting joined in the queries,which helps Teradata Optimizer to design the best execution plans.You can collect stats on global temporary tables also.
Do not get frustrated. Let me understand your problem correctly. Are you saying you have a create table stmt like below ?
CREATE TABLE As Select col,col,col from tbl1 join tbl2 on .....
If the above is true, the most likely reason why your table creation takes so long could be the PRIMARY INDEX..
The first column of the SELECT statement will be the primary index of the table if you haven't specified one explicitly. If you collected stats on the tables even to a reasonable extent, I don't see any reason why it must take 10 minutes, let alone hours.
Specify a primary index based on columns that are over 90 % unique and your table will get created in 10 minutes. You'll thankme for a long time to come :-)