Product Joins That Take Significantly longer than the optimizer estimates

Database
N/A

Product Joins That Take Significantly longer than the optimizer estimates

On our system, we often run into cases where the execution plan includes product joins that end up taking significantly longer than estimated. I was wondering if there are ideas for how to figure out how to improve the situation so that either the optimizer could make better decisions or so that the actual execution wouldn't take so long in these cases.

In typical cases, we have a large fact table being joined to several lookup tables and possibly a volatile table (the SQL generated by Microstrategy). When viewing the plan in the Teradata Performance monitor, it will look similar to this (an example taken where there was one large fact table being joined to 2 lookup tables, to the very small table used to provide row level security, and to a single volatile table with only a handful of records. Stats are collected on all of the tables, excepting the volatile table):

1 0:00.00 0:00.00 0 36 First, lock DATABASE1.BIG_FACT_TABLE for access, we lock DATABASE1.SMALL_LOOKUP_TABLE for access, we lock DATABASE1.SMALL_USER_TABLE_FOR_ROW_SECURITY for access and we lock DATABASE1.ANOTHER_SMALL_LOOKUP_TABLE for access.
2 0:00.00 0:00.01 0 36 Next, we do a journal temporary table step.
3 0:00.01 0:00.23 35 35 We do an All-AMPs RETRIEVE step from DATABASE1.SMALL_LOOKUP_TABLE by way of an all-rows scan into Spool 60898, which is redistributed by hash code to all AMPs. This step begins a parallel block of steps.
3 0:00.00 0:00.25 36 2304 We do an All-AMPs RETRIEVE step from Spool 53643 by way of an all-rows scan into Spool 60899, which is duplicated on all AMPs. This step ends a parallel block of steps.
4 0:00.01 0:00.04 55 5824 We do an All-AMPs JOIN step from Spool 60899 (Last Use) by way of an all-rows scan, which is joined to table ANOTHER_SMALL_LOOKUP_TABLE. Spool 60899 and table ANOTHER_SMALL_LOOKUP_TABLE are joined using a nested join . The result goes into Spool 60900, which is built locally on the AMPs.
5 0:00.02 0:00.15 55 5824 We do an All-AMPs JOIN step from Spool 60900 (Last Use) by way of an all-rows scan, which is joined to table ANOTHER_SMALL_LOOKUP_TABLE. Spool 60900 and table ANOTHER_SMALL_LOOKUP_TABLE are joined using a row id join . The result goes into Spool 60901, which is redistributed by hash code to all AMPs.
6 0:00.01 0:00.05 180 16128 We do an All-AMPs JOIN step from DATABASE1.SMALL_USER_TABLE_FOR_ROW_SECURITY by way of an all-rows scan, which is joined to Spool 60901. table SMALL_USER_TABLE_FOR_ROW_SECURITY and Spool 60901 are joined using a merge join . The result goes into Spool 60902, which is duplicated on all AMPs. This step begins a parallel block of steps.
6 0:00.01 0:00.05 396 1260 We do an All-AMPs JOIN step from DATABASE1.SMALL_USER_TABLE_FOR_ROW_SECURITY by way of an all-rows scan, which is joined to Spool 60898. table SMALL_USER_TABLE_FOR_ROW_SECURITY and Spool 60898 are joined using a merge join . The result goes into Spool 60903, which is duplicated on all AMPs. This step ends a parallel block of steps.
7 0:39.78 46:10.91 1245900 60505860 We do an All-AMPs JOIN step from Spool 60902 (Last Use) by way of an all-rows scan, which is joined to table BIG_FACT_TABLE. Spool 60902 and table BIG_FACT_TABLE are joined using a product join . The result goes into Spool 60904, which is built locally on the AMPs.
8 0:00.61 108770 We do an All-AMPs JOIN step from Spool 60903 (Last Use) by way of an all-rows scan, which is joined to Spool 60904. Spool 60903 and Spool 60904 are joined using a product join . The result goes into Spool 60897, which is built locally on the AMPs.
9 0:00.09 40125 We do a SUM step to aggregate from Spool 60897 (Last Use) by way of an all-rows scan. Aggregate Intermediate Results are computed globally, then placed in Spool 60905.
10 0:00.04 40125 We do an All-AMPs RETRIEVE step from Spool 60905 (Last Use) by way of an all-rows scan into Spool 60895, which is redistributed by hash code to all AMPs.
11 0:00.00 0 We do a MERGE into Spool 53644 from Spool 60895.
12 0:00.00 0 We Spoil the parser's dictionary cache for the table.
13 0:00.00 0 We send out an END TRANSACTION step to all AMPs involved in processing the request.

Obviously, the estimated IOs on step 7 were way off as was the estimated time, but I am not sure how to rectify this, nor am I sure (though I suspect) if the estimate was better that the execution path might be different and more efficient.

Any suggestions would be most appreciated. This is V2R.06.00.02.09 Version 06.00.02.24.