I have this performance problem where in I have to join a big table (> 1million records ) on column(which is a part of PI) to get few columns(4) to the semantic layer. I have implemented the code , but the explain plan says the table getting redistributed for every load running. Redistributing 1 Million records for every load(batchly - 1 hr) is causing bottle necks for other queries).
Technical details:(Table names and columns are masked for security reasons)
Table to be joined with : Financial_Statement -- no of columns 98
--no of rows > 1million
--PI(col1,col2,col3,col4) all are varchar(100)
--NUSI(col60,col70) integer and char datatypes
--Vosi(col45) date datatype
here col2 is used in joining condition.
Would appreciate any help!!!
Thanks in advance.
You do not have mentioned the 2nd table's information. If the 2nd table is small you can create a hashindex on that to avoid redistribution,
even if you have proper stats collected, the optimizer will redistribute the smaller table.
Second thing is you are using a column from PI, if you use a column from PI consisting of multiple column then you are not getting the benifit of PI.
You have to use all the columns of PI to take advantage of PI performance.
first thing is to check the statistics. And information about 2nd table is, indeed, very needed.
Apart from that, if you select only a few columns(4) from a table, a distribution might be a good choice, actually. Because redistribution takes place only for the columns that you select.
How do you know that it is causing a bottleneck? Is there a skew when the table gets redistributed?
p.s. also, you might want to introduce IDs instead of varchars for PI column, if possible. But that does not relate to the redistribution, though.