Performance issue while joining the larger tables


Performance issue while joining the larger tables

Hi All,


I am facing a performance issue with the background query of the report . In the SELECT statement, we are joining two large tables ( 150 columns and 115 million records)  with different condition ( not PI columns) and we need to fetch the data only from 10 columns.Still  Optimizer has to re-distribute the 100 million records twice causing the performance issues and report is taking 3-4 minutes to run. Users are expecting the report to be run in less than 15-20 seconds.


To handle the performance, I had to create a two JOIN indexes with PI being the join columns. This way I was able to achieve the expected performance. The concern here is, I would get the similar requests from multiple users and I won't be able to create Join indexes for all of them. Please let me know the best solution for these kind of cases ( re-distribution of data on columns different than PI) . I read in few blogs that Columnar would fit this use case but haven't tried that option . 




Re: Performance issue while joining the larger tables

Few of the conventional steps we can follow and check the performance are: 1) Explain plan the query and collect stats on the joining columns. 2) Create smaller worktables with required columns and try to join the smaller tables which will avoid the redistribution of data. 3) Use column compression option on the work tables if we have a possibility.
Teradata Employee

Re: Performance issue while joining the larger tables

The optimizer will automatically eliminate any unneeded columns prior to redistribution so placing the data into projected tables first will not improve the performance.


Is it required to join all rows from both tables? Any selection that can be applied to the tables will reduce the amount of data redistributed and joined.


There is not a way to eliminate the redistribution if the join is performed on non-PI columns. It is usually good to gather and review step level DBQL data for the execution of the query to make sure that there is a good understanding of which steps of the query execution are requiring the bulk of the time/resources.


As noted above, it is always good to provide the query and explain when asking for the community to provide help with query performance issues.