In above query, both STG and TGT has PI as SRC_TRAN_ID
TGT has stats collected on ROW_STAT_CD and SRC_SYS
TGT is a huge table with 18 months of history partitioned by day interval. and stats collected on parition.
in the above query i can't use a partition condition as per business logic
In production the query is taking a long time and huge I/O. The obvious reason is no use of parition.
But is there any other way I can try tune this query ? Appriciate your help.
Have you tried creating SI on ROW_STAT_CD , .SRC_SYS columnn of TGT table .
As per my knowledge that may help reducing IO at least .
I didn't try SI. as it is very huge table and loaded by multiload. Even i proposed compression on both the columns as distinct values are less but as it is a DDL change need to take a complete rock and roll mechanism on huge table which was not accepted.
The count of TARGET is 3 billiion and stage count is around 2 million.
Here is the explain in UAT -
We do an all-AMPs
JOIN step from APP_STG.STG by way of a RowHash
match scan, which is joined to APP_TGT.TRAN_ID_VW by way of a
RowHash match scan with a condition of (
"(APP_TGT.TRAN_ID_VW.ROW_STAT_CD = 'A') AND
(APP_TGT.TRAN_ID_VW.SRC_SYS_ID = 'SOURCE')"
). APP_STG.STG and
APP_TGT.TRAN_ID_VW are joined using a sliding-window merge join,
with a join condition of (
). The input table
APP_TGT.TRAN_ID_VW will not be cached in memory. The result goes
Spool 1 (group_amps), which is built locally on the AMPs.
The size of
Spool 1 is estimated with low confidence to be 29,965
rows (1,767,935 bytes). The estimated time for this step is 17.34