self join

Analytics
Enthusiast

self join

table 1
id1 id2 date val1 val2 val3 val4
a b 1-jan-09 1 10 1 8
c d 1-Jan-09 2 7 0 0
e f 1-jan-09 1 1 2 6
g h 1-Jan-09 4 4 8 16
i j 1-jan-09 3 6 6 12
k l 1-jan-09 7 21 1 4
m n 1-jan-09 6 24 2 8
o p 2-jan-09 5 5 0 0
q r 1-jan-09 1 11 3 33

table 2
id1 attribute1 attribute2 attribute3 attribute4 attribute5
a 1 1 1 1 1
c 1 1 1 1 1
e 1 1 1 1 1
g 2 2 2 2 2
i 2 2 2 2 2
k 2 2 2 2 2
m 3 3 3 3 3
o 3 3 3 3 3
q 3 3 3 3 3

I have to select id1,id2,date,val1,val2 from the table 1 where the val3 is greater than 0
for the id1 with same attributes for the same day.

For example
first 3 rows should be ignored as they the same attributes but 0 for id1: c for the same date.
Then the 2-Jan transaction is not taken into account as it is a different date but the other items
belonging to id1(m and q) are selected.
g h 1-Jan-09 4 4 8 16
i j 1-jan-09 3 6 6 12
k l 1-jan-09 7 21 1 4
m n 1-jan-09 6 24 2 8
q r 1-jan-09 1 11 3 33

How do i do this. The table 1 has around 8 million records and table 2 has 500,000 records. The performance should also be good.

6 REPLIES
Junior Contributor

Re: self join

Sorry, but i can't figure out, what rows and why should be returned.
Could you rephrase your example?

Dieter
Enthusiast

Re: self join

We have grouping as given below
group 1 : a,b,c (ids belonging to group 1)
group 2 : d,e,f (ids belonging to group 2)
group 3 : g,h,i (ids belonging to group 3)

The grouping is done based on some attributes, as given below
id attribute1 attribute2 attribute3
a 1 2 3
b 1 2 3
c 1 2 3
d 4 5 6
e 4 5 6
f 4 5 6
g 7 8 9
h 7 8 9
i 7 8 9

The driver table has the following structure(key : id1,id2,date)
id1 id2 date col1 col2 col3 col4
a 1 1-jan-09 1 1 3 7
b 1 1-jan-09 1 1 6 5
c 1 1-jan-09 1 1 1 11
d 1 1-jan-09 1 1 0 0
e 1 1-jan-09 1 1 10 8
f 1 1-jan-09 1 1 1 21
g 1 1-jan-09 1 1 5 12
h 1 1-jan-09 1 1 4 2
i 1 2-jan-09 1 1 0 0

The driver table's id1 maps to the attribute table's id.

DESIRED OUTPUT: The SQL needs to extract
id1 id2 date col1 col2

if the none of the items in the grouping have a value <=0 for col3.
In the following output if we see, we did not select group 2 items as the id: d has 0 for column3, so all the items belonging to its group are ignored.

The third group is selected as the column 3 has 0 value for a different id1,id2,date combination for the same group.

id1 id2 date col1 col2
a 1 1-jan-09 1 1
b 1 1-jan-09 1 1
c 1 1-jan-09 1 1
g 1 1-jan-09 1 1
h 1 1-jan-09 1 1

The query i wrote looks like (but performance is very bad)
select * from driver_table outloop
where not exists
(
select id1,id2,column3
from driver_table inloop
where column3<=0
and inloop.id1 in
(
select d1.id1
from attribute_table d1 join attribute_table d2
on d1.attribute1 = d2.attribute1
and d1.attribute2 = d2.attribute2
and d1.attribute3 = d2.attribute3
and d1.attribute4 = d2.attribute4
and d1.attribute5 = d2.attribute5
where d2.id1 = outloop.id1
)
and inloop.id2 = outloop.id2
and inloop.date = outloop.date
)
Junior Contributor

Re: self join

"different id1,id2,date combination for the same group"

This means your grouping columns are not only (attribute1,attribute2,attribute3), but (id2,date), too.

As you didn't provide DDL and INSERTs, this is untested, but should return the expected result set based on your narrative:

SELECT d.* FROM driver_table d JOIN attribute_table a
ON a.id = d.id1
QUALIFY
MIN(CASE WHEN col3=0 THEN 0 END) OVER (PARTITION BY a.attribute1,a.attribute2,a.attribute3,d.id2,DATE) IS NULL

Dieter
Enthusiast

Re: self join

I tried the following but getting Error 7547: Target row updated by multiple source rows.
Output directed to Answerset window

UPDATE FLT
FROM WM_AD_HOC.FILTER_SALES_TEMP FLT,
(
SELECT TEMP1.ITEM_NBR,TEMP1.STORE_NBR,TEMP1.TRAN_DATE,TEMP1.SALES_UNITS,TEMP1.SALES_RETAIL
FROM WM_AD_HOC.E_SLS_INV_TEMP TEMP1 JOIN GB_WM_VM.ITEM_DESC DESC1
ON DESC1.ITEM_NBR = TEMP1.ITEM_NBR
QUALIFY
MIN(CASE WHEN TEMP1.OH_UNITS=0 THEN 0 END)
OVER (PARTITION BY DESC1.ACCT_DEPT_NBR,DESC1.FINELINE,
DESC1.SUBCLASS_CODE,DESC1.PRIMARY_DESC,DESC1.COLOR_DESC,
TEMP1.STORE_NBR,TEMP1.TRAN_DATE) IS NULL
WHERE TEMP1.SALES_UNITS <> 0
) TEMP1
SET FLTD_SALES_UNITS = FLTD_SALES_UNITS + TEMP1.SALES_UNITS,
FLTD_SALES_RETAIL = FLTD_SALES_RETAIL + TEMP1.SALES_RETAIL
WHERE FLT.ITEM_NBR = TEMP1.ITEM_NBR
AND FLT.STORE_NBR = TEMP1.STORE_NBR
AND FLT.LAST_UPDATE_DT = TEMP1.TRAN_DATE

Enthusiast

Re: self join

Also in addition to the above error i have the following question...

like the itme being grouped by attributes, if there are 2 items
then if only one has value for the item, store and date
the other one does not have record at all for that date, the abover query excludes it... how can we over come that?

Is it possible to do that with a left outer join like this, also... this is giving me 2646 spool space error,
UPDATE FLT
FROM WM_AD_HOC.FILTER_SALES_TEMP FLT,
(
SELECT LTEMP1.ITEM_NBR,LTEMP1.STORE_NBR,LTEMP1.TRAN_DATE,LTEMP1.REPORT_CODE,LTEMP1.SALES_UNITS,LTEMP1.SALES_RETAIL
FROM WM_AD_HOC.E_SLS_INV_TEMP LTEMP1
LEFT OUTER JOIN
( SELECT RTEMP.ITEM_NBR,RTEMP.STORE_NBR,RTEMP.TRAN_DATE
FROM WM_AD_HOC.E_SLS_INV_TEMP RTEMP
WHERE RTEMP.OH_UNITS = 0
AND RTEMP.ITEM_NBR IN (SELECT D1.ITEM_NBR
FROM WM_USER.ITEM_DESC D1 JOIN WM_USER.ITEM_DESC D2
ON D1.ACCT_DEPT_NBR = D2.ACCT_DEPT_NBR
AND D1.SUBCLASS_CODE = D2.SUBCLASS_CODE
AND D1.FINELINE = D2.FINELINE
AND D1.PRIMARY_DESC = D2.PRIMARY_DESC
AND D1.COLOR_DESC = D2.COLOR_DESC
WHERE D2.ITEM_NBR = RTEMP.ITEM_NBR
)

) RTEMP1
ON LTEMP1.ITEM_NBR = RTEMP1.ITEM_NBR
AND LTEMP1.STORE_NBR = RTEMP1.STORE_NBR
AND LTEMP1.TRAN_DATE = RTEMP1.TRAN_DATE
WHERE (RTEMP1.ITEM_NBR,RTEMP1.STORE_NBR,RTEMP1.TRAN_DATE) IS NULL
) TEMP1
SET FLTD_SALES_UNITS = FLTD_SALES_UNITS + TEMP1.SALES_UNITS ,
FLTD_SALES_RETAIL = FLTD_SALES_RETAIL + TEMP1.SALES_RETAIL
WHERE FLT.ITEM_NBR = TEMP1.ITEM_NBR
AND FLT.STORE_NBR = TEMP1.STORE_NBR
AND FLT.LAST_UPDATE_DT = TEMP1.TRAN_DATE

Junior Contributor

Re: self join

To get rid of the 7547 you have to aggregate the data in TEMP1 before updating.

Dieter