Optimization with JOIN condition

Database

Optimization with JOIN condition

Hello guys,

I would appreciate it if you can help me with a problem that i have.

I have this join condition :

    SELECT *

FROM

T1_STAGING.(first_table) AS STG

JOIN T1_STAGING.(second_table) AS B

ON 

(

STG.DLOF_ID_NO=B.DLOF_ID_NO_RU

)

This simple join is taking too long to finish, more than 20 minutes. The data of each table is less than 600,000K data. i tried the following things :

I took statistics on each table. 

I changed the columns to be PRIMARY INDEX. 

I created JOIN INDEX for the second table but still nothing! 

The query never ends it takes 20 mins ++. This seems to be data distribution problem in the second table, but i can't do anything with the data. 

Please bear in mind that if i join my first_table with any other it takes only seconds. 

Can you give me a suggestion to try? I need to optimize it for better performance.

4 REPLIES
Enthusiast

Re: Optimization with JOIN condition

Hi Marios,

Everything looks correct. Did you tried by doing an explain plan? 

Also do you need all the columns from both tables or only few columns?

Enthusiast

Re: Optimization with JOIN condition

Is B.DLOF_ID_NO_RU also primary index in the second table B? (Then why JI is required?)

Are the data types of both the column involved in join same ? is second table has any partion added and see if you can use that PPI column in where clause to filter some un necessary history.

if it is true distribution issue in second table need to fix it somehow by showing the metrics to your DBA.

If you can give explain plan that would help us to analyze more.

Re: Optimization with JOIN condition

The tables that i am using are staging tables which we do not use primary index. I changed my tables and added primary index of my first table to be the DLOF_ID_NO column and for my second table the DLOF_ID_NO_RU column.  

Teradata Employee

Re: Optimization with JOIN condition

Hello Marios,

For any queries concerning Optimization, it is advisable to post the Explain Plan also. Having said that, you mentioned that you created PI on the Joining Columns (DLOF_ID_NO for 1st Table) & (DLOF_ID_NO_RU for 2nd Table). 

As such, the Joining should be using Merge Join Strategy with no redistribution or duplication required. If the releveant row-counts difference between the 2 tables is huge (Which can be correct or incorrect owing to statistics), then their is a possibility of Product Join taking precendence over Merge Join. This shouldn't be a problem as the Optimizer always chooses the least expensive plan. 

If you see Merge Join being used with no redistribution or duplication, then the Joining Process is taking time. This can be attributed to the datatype mis-match between joining columns, usage of different formats for joining columns, or scenario of Statistics over-explosion. 

I suggest you run the concerned query, monitor the same using Viewpoint & see which step is taking the majority of time. Usually, 1 or 2 steps take up the majority of time. Identifying those steps will help you to reduce the scope of your problem.

Thanks,

Smarak