I have a huge table with billions of rows which is joined against a table containing 350.000 rows. Bith tables have the same PI and the same Partitioning. Statistics are available and correct.
I would have expected a Rowkeybased Merge Join as PI and Partitions match.
But the Optimizer is doing the following:
It departitions the small table and does a subsequent Sliding Window Merge Join. I am on Teradata 14.10.
Why is this happening? Could this be less resource consuming than the Rowkey Based Merge Join?
Thanks in Advance
a Rowkey Based Merge Join is the best you can get on a partitioned table, exactlye the same as a PI-Join on non-partitioned tables.
Did you double check if the partitioning column is also included in the join?
Otherwise can you show the PPI definiton and the actual join?
I found another interesting behaviour when joining two tables with matching PI and Partitioning. I have the follwowing 2 tables:
Running the following query, which basically selects all partitions:
Like expected, the rowkey based merge join is much cheaper:
Rowkey Based MJ: 91 IOs and 0.02 CPU Seconds, 102MB Spool
Merge Join: 272 IOs and 0.24 CPU seconds 1.8 GB Spool
While I don't understand why the Optimizer is chosing a standard merge join if you select all partitions, it shows that there is a lot of potential for optimizing joins manually
I think that the optimizer does not want to do a rowkey-based merge join, if he thinks, that one table has many partitions and all of them are very small.
But I do not know why he does this.
Additionally it seems that he has a mindchange if he can exclude at least 1 partition.
For me this is semms also strange
Single stats on SALESID (the PI) and SALESDATE (Partition column). We are on TD 14.10 Full Stats (no samples)
Can you please try adding Stats on ( PARTITION ), ( PARTITION, PI) and ( PARTITION, PI, Partition Column) and check your results?
Below is the excerpt from Carrie's Blog on Partition recommendation.
I added the statistics you mentioned on both tables. Now i always get the traditional merge join with a redistribution step on (SalesId,SalesDate) for both tables at the begin...if i select all partitions or (partitions - 1) does not matter anymore.... the behaviour changed. but still, the rowkey based merged join would be much cheaper as I showed in my test setup.
I tried another setup, removing all statistics from Sales2 and Sales3. I still received for all partitions or (partitions -1 ) the traditional merge join.
The only setup were the selection of (partitions-1) leads to a rowkey based merge join is when i have stats on both tables on columns SalesID and SalesDate (single column stats).
It seem to be definitely related to statistics.
BTW, my test setup contains exactly 1 row per partition (in both tables).
I would assume that the optimizer knows in both cases (as it has statistics on all needed columns) that in one case it selected 69999 rows and in the other case 70000 rows, only one row difference...
The assumption that 69999 rows are cheaper to select with a rowkey based merge join but 70000 rows are cheaper with data redistribution and a traditional merge join is somehow not logical to me (and as my test showed: the traditional merge join is much more expensive).
Given 2 tables, same PI, same partitioning. Joining them on the the PI and the partioning columns (and maybe some additional columns).
Can there be an advantage of departioning one table but keeping the same PI and then doing a sliding window merge join instead of doing a rowkey based merge join??
I have no example where this could be advantageous.