Selecting duplicate records from a table - Complete record and not just the duplicate keys

Database
Enthusiast

Selecting duplicate records from a table - Complete record and not just the duplicate keys

Hi,

I have a situation where i have duplicate records in my table. This is determined based on a combination of a few natural/business keys.

I know the way to identify if there are duplicates using the below query :

SELECT COL1,COL2,COL3,COUNT(*)

FROM TABLE

GROUP BY COL1,COL2,COL3

HAVING COUNT(*)>1

Lets assume that this returns 2.6 million records.

My requirement is to retrieve the complete records from the table that are duplicated and NOT JUST the duplicate keys.

I coded the below SQL :

SELECT A.COL1,A.COL2,A.COL3,A.COL4,A.COL5

FROM

(

SELECT COL1,COL2,COL3,COL4,COL5

FROM TABLE

) A

INNER JOIN

(

SELECT DISTINCT COL1,COL2,COL3

FROM TABLE

GROUP BY COL1,COL2,COL3

HAVING COUNT(*)>1

) B

ON

A.COL1=B.COL1 AND

A.COL2=B.COL2 AND

A.COL3=B.COL3

This SQL is returining only 2.9 million records.  When i get 2.6 million records having more than 1 record for a key combination, when i join it with the table i should get a minimum of 2.6 Million *2 (assuming there are 2 duplicate keys) or more. But it can't be less.

I understand that I am not getting the desired results. Could you please share your thoughts if the above approach is correct? and if so, could you please share your thoughts on not getting the desired output?

12 REPLIES
Enthusiast

Re: Selecting duplicate records from a table - Complete record and not just the duplicate keys

Try the following query:

SELECT *
FROM TableName
WHERE (col1,col2,col3) IN (SELECT col1,col2,col3
FROM TABLENAME
GROUP BY 1,2,3
HAVING COUNT(*)>1);
Khurram
Junior Contributor

Re: Selecting duplicate records from a table - Complete record and not just the duplicate keys

You're correct, there should be at least twice the number of rows returned. Your code is overly complex, you don't need the DISTINCT and the Derived Table A, but this will not change the result:

SELECT A.*
FROM TABLE A
INNER JOIN
(
SELECT COL1,COL2,COL3
FROM TABLE
GROUP BY COL1,COL2,COL3
HAVING COUNT(*)>1
) B
ON
A.COL1=B.COL1 AND
A.COL2=B.COL2 AND
A.COL3=B.COL3

There must be something else, could you post your actual query?

Depending on the number of columns in your table and the number of duplicates there is another approach:

SELECT *
FROM TABLE
QULAIFY
COUNT(*) OVER (PARTITION BY COL1,COL2,COL3) > 1

Dieter

Enthusiast

Re: Selecting duplicate records from a table - Complete record and not just the duplicate keys

Hi ,

Thanks for your response. Yes, the query is a little overstated and thanks for simplifying it.

Please find the complete query that i was running below. I have  total of 15 columns and the duplicate keys are 6 :

SELECT A.GR,A.DT,A.FA,A.ERID,A.HID,A.OE,A.LOE,A.OBE,A.OBT,A.STE,A.CLD,A.GI,A.LT,A.RE,A.HTLT

FROM

(SELECT A.GR,A.DT,A.FA,A.ERID,A.HID,A.OE,A.LOE,A.OBE,A.OBT,A.STE,A.CLD,A.GI,A.LT,A.RE,A.HTLT

FROM

ADE.OBSN_E)  A

INNER JOIN

(SELECT DISTINCT GR,GI,OE,LOE,OBE,CLD

FROM ADE.OBSN_E

GROUP BY

GR,GI,OE,LOE,OBE,CLD

HAVING COUNT(*)>1)  B

ON

A.GR=B.GR AND

A.GI=B.GI AND

A.OE=B.OE AND

A.LOE=B.LOE AND

A.OBE=B.OBE AND

A.CLD=B.CLD

In the format mentioned by you :

SELECT A.* FROM

ADE.OBSN_E A

INNER JOIN

(

SELECT GR,GI,OE,LOE,OBE,CLD  

FROM ADE.OBSN_E  

GROUP BY GR,GI,OE,LOE,OBE,CLD 

HAVING COUNT(*)>1 

) B

ON

A.GR=B.GR AND

A.GI=B.GI AND

A.OE=B.OE AND

A.LOE=B.LOE AND

A.OBE=B.OBE AND

A.CLD=B.CLD

Junior Contributor

Re: Selecting duplicate records from a table - Complete record and not just the duplicate keys

You said those columns are the Primary Key of the table, but there might be some NULLs in it?

This would result in returning less rows.

Dieter

Enthusiast

Re: Selecting duplicate records from a table - Complete record and not just the duplicate keys

Yes, that's right. I have verified the data in the table and could see that one or more columns (GR,GI,OE,LOE,OBE,CLD) are having NULL values.

Please note that these are not primary indexes/keys defined on the table but the natural/business keys from the source data.

Enthusiast

Re: Selecting duplicate records from a table - Complete record and not just the duplicate keys

Hi,

Ignoring the fact that the tables has NULL values for natural key columns, could you let me know if the logic being used to capture the duplicate records is correct ?

Thanks Much!!!

Enthusiast

Re: Selecting duplicate records from a table - Complete record and not just the duplicate keys

Why don't you run a test on some sample data????

Re: Selecting duplicate records from a table - Complete record and not just the duplicate keys

Hi, 

I need teradata query for below scenario.

Table:

 NO     Name          YOJ       NCD     PY_Amt        RD_ID    PY_DT

  1       JOE             2010     CN      19010.00      213        01/20/2011

  2       JOE             2010     CN      19010.00      435        04/15/2012

  3       JOHN           2011     RT      34555.00      654        03/25/2013

  4       JOHN           2011     RT      23415.00      654        04/28/2014

  5      KANE           2012     ER     12211.00      1231       01/20/2013 

  5      KANE           2012     ER     12211.00      1231       02/20/2013 

  5      KIM             2010     TU     34521.00      1531       01/20/2013 

From the above table, we  need to pick the records which are having more than one record for the combination of Name,YOJ,NCD and differing with either of PY_Amt or RD_ID and From these we need to pick only earlier record based on PY_DT

Result table should be pick only the yellow marked records.

Could you please provide query for this

Thanks in advance,

Gowthami Krishnamoorthy

Junior Contributor

Re: Selecting duplicate records from a table - Complete record and not just the duplicate keys

There are no yellow marked records, you must add the rows using the code editor, there you can highlight.

It's either based on COUNT OVER or maybe a simple ROW_NUMBER...