Redistribution and duplication of records across all Amps

Database
Enthusiast

Redistribution and duplication of records across all Amps

hello all,

I am running an update statement which tries to update A.BOSS_Appln_Id to B.BOSS_Appln_Id. The statement is as below.

UPDATE EDWST1E_DATA_SRVC_ORDER.A
FROM (SELECT BOSS_Appln_Id FROM EDWST1E_DATA_SRVC_ORDER.B
Where Src_Key_Part1_Val = 'Unknown' and Key_Dmn_Id = 298) T10160
SET BOSS_Appln_Id = T10160.BOSS_Appln_Id;

Table B has UPI on BOSS_Appln_Id and Table A is NUPI on Srvc_Order_Num

the Stats on both table are
HELP STATS EDWST1E_DATA_SRVC_ORDER.B ;
Date Time Unique Values Column Names
09/02/10 15:08:26 1 BOSS_Appln_Id
09/02/09 15:33:27 1 Key_Dmn_Id,Src_Key_Part1_Val

help stats EDWST1E_DATA_SRVC_ORDER.A

Date Time Unique Values Column Names
09/02/10 13:25:23 3,071,269 Srvc_Order_Num
09/02/10 13:25:25 3,106,652 Sbl_Intgn_Id
09/02/10 13:25:26 302 Pblcn_Start_Dt
09/02/10 13:25:26 223 Pblcn_End_Dt
09/02/10 13:26:18 3,164,223 Sbl_Intgn_Id,Pblcn_Start_Dt

running explain on the above query gives me the below text.As you can see in STEP 4 table B is small table and it's captured in Spool 2,which is duplicated to all amps. After this step table B rows are avaliable to be joined with rows of table A.

When the small table is duplicated on all amps then why in STEP 5 the big table(A) is re-distributed again ? My understanding is that for join the rows should be avaliable on same AMP which is already achieved by duplciating the smaller table on all amps. I always thought that either Redistribution or duplication will solve the purpose of joining rows and we do not require both steps to be done in a query.Is my understanding correct? Any logical reason for the below behaviour of PE?

Appreciate any help...

Explanation
1) First, we lock a distinct EDWST1E_DATA_SRVC_ORDER."pseudo table"
for read on a RowHash to prevent global deadlock for
EDWST1E_DATA_SRVC_ORDER.B.
2) Next, we lock a distinct EDWST1E_DATA_SRVC_ORDER."pseudo table"
for write on a RowHash to prevent global deadlock for
EDWST1E_DATA_SRVC_ORDER.A.
3) We lock EDWST1E_DATA_SRVC_ORDER.B for read,
and we lock EDWST1E_DATA_SRVC_ORDER.A for
write.
4) We do an all-AMPs RETRIEVE step from
EDWST1E_DATA_SRVC_ORDER.B by way of an
all-rows scan with a condition of (
"(EDWST1E_DATA_SRVC_ORDER.B.Key_Dmn_Id = 298)
AND
(EDWST1E_DATA_SRVC_ORDER.B.Src_Key_Part1_Val
= 'Unknown')") into Spool 2 (all_amps) (compressed columns
allowed), b] The size of Spool 2 is
estimated with high confidence to be 80 rows. The estimated time
for this step is 0.01 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to
EDWST1E_DATA_SRVC_ORDER.A by way of an
all-rows scan with no residual conditions. Spool 2 and
EDWST1E_DATA_SRVC_ORDER.A are joined using a
product join, with a join condition of ("(1=1)"). The result goes
into Spool 1 (all_amps), which is redistributed by hash code to
all AMPs.
Then we do a SORT to order Spool 1 by the sort key in
spool field1. The size of Spool 1 is estimated with high
confidence to be 3,164,223 rows. The estimated time for this step
is 1.35 seconds.
6) We do a MERGE Update to
EDWST1E_DATA_SRVC_ORDER.A from Spool 1 (Last
Use) via ROWID.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.

1 REPLY
Junior Contributor

Re: Redistribution and duplication of records across all Amps

Maybe the optimizer is confused because of the previous product join.

But if you want to update all rows to the same value, i'd recommend two separate steps in a Stored Proc:

DEFINE AppIn INTEGER?;

SELECT BOSS_Appln_Id INTO :AppIn
FROM EDWST1E_DATA_SRVC_ORDER.B
Where Src_Key_Part1_Val = 'Unknown' and Key_Dmn_Id = 298
;

UPDATE EDWST1E_DATA_SRVC_ORDER.A
SET BOSS_Appln_Id = :Appln;

Dieter