I am trying to get one of my queries to run faster.
Table 1 : Contains Key 1, which will be unique, about 800,000 rows. The table contains 2 other columns, which are not being used in the join below.
Table 2 : Contains 6 columns, and is keyed on Key 1 (different key from Table1.Key1) and a date. This table contains Key 3 (which is the same as Table1.Key1), but is not the primary key. I need to extract 5 years of data from Table 2, where Table2.Key3 = Table1.Key1.
To make life easier, I extract 5 years of data from the production table, which contains many more columns) into a temp table. This brought in about 40M rows in about 20 seconds. I created Indexes on Table 1 and table 2 on matching keys and collected stats before and after loading the data, before the join.
The next step is to join Table 1 and Table 2, and extract only those rows that contain Table1.Key1 in Table2.Key3.
INSERT INTO TABLE 3 (which is an empty table) SELECT columns from Table 2 FROM Table 2 WHERE Table 2.Key1 = Table2.Key1 AND Table 2.Date = Table2.Date AND Table 2.Key3 IN ( SELECT Key1 FROM Table 1 GROUP BY 1 ) GROUP BY 01, 02, 03, 04
The above query runs for about 50-60 minutes, and the Explains says it should run in about 35 seconds. 1. Is it the best way to write the query? 2. Is it not executing the sub-query for each row from table 2? 3. Is there not a way to write the query in such a way that the subquery executes just once for all rows in table 2? 4. Any other suggestions?
Does your explain show a product join/cross join? Also the actual SQL Stmt and the actual explain plan would be beneficial. Have any of the tables in the select statement got no data in them? Are your stats correct and will the optimiser see the best explain plan given the stats provided?