How to limit rows to only those where there are duplicates of a specific column?

Database

How to limit rows to only those where there are duplicates of a specific column?

I've seen how to select distinct information, but I haven't successfully found anything for the opposite.  I'm looking to limit my search only to items with duplicate UPCs.  My query essentially looks like this...

SELECT
item_number
,upc
,description
,store_number

FROM
table1

WHERE
product_category IN (1,2,3)

...and my results look like:







item_num upc descrip store_num
5 44444-44444 fish 23
3 33333-33333 cat 17
6 33333-33333 cat 42
2 22222-22222 dog 17
4 22222-22222 dog 42
1 11111-11111 snake 8

I only want to see rows with a UPC that exists more than once, but I want to see each row.  I don't want a count function to tell me there are two each of 22222-22222 and 33333-33333, but I want to restrict my results to see only and exactly the rows containing UPCs which appear more than once.  In this scenario, I want to see four rows--both cat rows and both dog rows.

Thank you in advance!

17 REPLIES
Senior Apprentice

Re: How to limit rows to only those where there are duplicates of a specific column?

select * from table1

qualify count(*) over (partition by upc) > 1

Dieter

Re: How to limit rows to only those where there are duplicates of a specific column?

Thank you!  This does exactly what I want, and surprisingly more easily than I thought.

The last thing I need to do is repeat this step, with a column other than UPC.  Let's pretend store_num is 17 for both of my 33333-33333 UPCs, but my 22222-22222 has the store_num as shown.  How would I limit it to show only and exactly the rows there both UPC and store_num are duplicate?

I tried just adding, but this did not change m

AND COUNT(*) OVER (PARTITION BY store_num)>1

Then I tried this, but it keeps asking for something like ")" betwen "upc" and "AND".

COUNT(*) OVER (PARTITION BY upc AND store_num)>1

At this point I would be satisfied to play with it in Excel and filter it as I need it, but even after the first qualify my query is returning 170,000 rows...quite a few more than the 65,000 I'm allowed.

Senior Apprentice

Re: How to limit rows to only those where there are duplicates of a specific column?

PARTITION BY is similar to GROUP BY for aggregates,

if i understand you correctly you need:

COUNT(*) OVER (PARTITION BY upc, store_num)>1

 

Dieter

Re: How to limit rows to only those where there are duplicates of a specific column?

This works perfectly--thanks for all of your help.  For the most part I understand how the PARTITION BY works...you're partitioning the data by a certain criterion.

But when you PARTITION BY more than one column, does it know to partition by one or the other first?  or does it not matter?

Senior Apprentice

Re: How to limit rows to only those where there are duplicates of a specific column?

It's like GROUP BY col1, col2 vs. GROUP BY col2, col1, it doesn't matter.

Dieter

Re: How to limit rows to only those where there are duplicates of a specific column?

I have a teradata query which uses aggregate function and GROUP BY. GROUP BY restrictes the results to half for eg. without using aggregate/group by, query fetches 26,000 records and with aggreagate/group by it restrictes the resulting rows to 13,000 Only. How I can display all duplicate records in my table with aggregate function and GROUP BY.

Senior Apprentice

Re: How to limit rows to only those where there are duplicates of a specific column?

Try

HAVING COUNT(*) > 1

Dieter

Re: How to limit rows to only those where there are duplicates of a specific column?

Hi Dieter,

Can you please help for below requirement.

I have below records in my table,

  ACCT    ID    CD     

1)  123   2500   AB1         

2)  123   24386  AB2

3)  123   2500   AB3

4)  123   2500   AB4

5)  123   2500   AB5

6)  123   24386  AB6

7)  123   2500   AB7

8)  123   2500   AB8

order by CD after then IF my ID 24386 is in between 2500(2 nd row &6 rows are in between ) then i need to take the cd of next rows till i found another 24386.

expected result for column  DER_CD

  ACCT    ID    CD     DER_CD

1)  123   2500   AB1   AB1       

3)  123   2500   AB3   AB3

4)  123   2500   AB4   AB3

5)  123   2500   AB5   AB3

7)  123   2500   AB7   AB7

8)  123   2500   AB8   AB7

Please help on that

Thanks,

Chandan


Senior Apprentice

Re: How to limit rows to only those where there are duplicates of a specific column?

Hi Chandan,

based on your result set (I didn't fully understand your explanantion) this should work:

CREATE VOLATILE TABLE vt (  ACCT INT,    ID  INT,  code     CHAR(3))
ON COMMIT PRESERVE ROWS;

INS vt(123 , 2500 ,'AB1');
INS vt(123 , 24386 ,'AB2');
INS vt(123 , 2500 ,'AB3');
INS vt(123 , 2500 ,'AB4');
INS vt(123 , 2500 ,'AB5');
INS vt(123 , 24386 ,'AB6');
INS vt(123 , 2400 ,'AB7');
INS vt(123 , 2400 ,'AB8');

SELECT dt.*,
COALESCE(MAX(x)
OVER (PARTITION BY acct
ORDER BY code
ROWS UNBOUNDED PRECEDING), code)
FROM
(
SELECT vt.*,
CASE WHEN id = 24386
THEN MIN(code)
OVER (PARTITION BY acct
ORDER BY code
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)
END AS x
FROM vt
) AS dt
QUALIFY id <> 24386