Database
Enthusiast

Join Issues - Need help in most efficient way to solve the problem

Hi,

I need some help to solve the following problem in the most efficient way.

Consider Table1 -

`ShopNbr Item1 1001 1011 1022 1012 1043 1003 1023 1033 1043 105`

I have provided a snapshot of the data above. The actual data will have close to 1000 shops and 300 items. Each shop need not have all 300 items.We have Shop-Item level data here.

In the above case, I need to choose 2 shops which will capture the maximum number of distinct items. i.e. If i choose Shops 1 and 2, I will get items 100, 101, 102, 104 ( 4 items) and similarly i need to check for combinations 2&3 and 1&3 and choose the one which captures most items.

I can do this through a cross join but my problem is in my actual data I have close to 1000 shops. We need to know the most efficient way to do this.

In simple words I need the best combination (lets say I need to choose 5 out of 1000 shops), it would be 1000C5 iterations which is nearly 1000^5 which is not practical.

Also note we do not have access to stores procedures in our servers.

Thanks in advance. Any help will be extremely useful :)

Karthik

14 REPLIES
Enthusiast

Re: Join Issues - Need help in most efficient way to solve the problem

Any suggestions ?

Enthusiast

Re: Join Issues - Need help in most efficient way to solve the problem

Dieter - Any suggestions ?

Junior Contributor

Re: Join Issues - Need help in most efficient way to solve the problem

Hi Karthik,

i don't think there's a simple solution, especially without Stored Procedures.

As it's only 300 items you might use bit-functions to create a bit-string and then BITOR and COUNTSET to combine and count.

For two shops it's only 1000 * 1000 rows and maybe you can apply some rigid filter before you add the next shop to reduce the number of comparisons.

Enthusiast

Re: Join Issues - Need help in most efficient way to solve the problem

Dieter - Thanks a lot.

If I give a rule saying that always choose a shop which has maximum items, can I write a normal SQL code to achieve that ?

In other words, among all 1000 shops I will identify 1 shop which has maximum items, select that shop, delete all the items chosen in that shop, and again identify 2nd shop with maximum items and so on till 5 shops or even 10 shops if needed. ( I will know that number of shops)

How easy is this ? I am not able to come up with a coding logic for this though in Teradata.

Thanks again.

Enthusiast

Re: Join Issues - Need help in most efficient way to solve the problem

`CREATE VOLATILE TABLE T (SHOP INT,ITEM INT)UNIQUE PRIMARY INDEX(SHOP,ITEM)ON COMMIT PRESERVE ROWS;INSERT INTO T VALUES (1,100);INSERT INTO T VALUES (1,101);INSERT INTO T VALUES (1,102);INSERT INTO T VALUES (2,101);INSERT INTO T VALUES (2,104);INSERT INTO T VALUES (3,100);INSERT INTO T VALUES (3,102);INSERT INTO T VALUES (3,103);INSERT INTO T VALUES (3,104);INSERT INTO T VALUES (3,105);SELECT SHOP1, SHOP2 , SUM(CNT) AS TOTAL_CNTFROM (SELECT T1.SHOP AS SHOP1, T2.SHOP AS SHOP2, COUNT(DISTINCT T1.ITEM) AS CNTFROM T T1 INNER JOINT T2ONT1.ITEM = T2.ITEM ANDT1.SHOP LT T2.SHOPGROUP BY T1.SHOP, T2.SHOP)A GROUP BY 1,2QUALIFY ROW_NUMBER() OVER (ORDER BY TOTAL_CNT DESC) =1`
Junior Contributor

Re: Join Issues - Need help in most efficient way to solve the problem

Hi Karthik,

imho you can't choose the shop with the maximum number of items, e.g.

shop 1: 2,3,4,5,6,7

shop 2:1,2,3,4

shop 3: 5,6,7,8

shop 1 + 2 or shop 1 + 3 has less items than shop 2+ 3

Enthusiast

Re: Join Issues - Need help in most efficient way to solve the problem

Agreed totally with you Dieter. But in the actual data we have we feel that the case you have explained would be pretty rare and we are okay with ignorning that though I understand we are compromising the accuracy of the model.

But if we are fine with it, I still am not able to come up with the macro which can perform the same for me.

Enthusiast

Re: Join Issues - Need help in most efficient way to solve the problem

Dieter - Sorry again but any new ideas ?