Deleting/identifying duplicates in the table

Database

Deleting/identifying duplicates in the table

Hi,

I have table with 100 columns and and has 10milllion records. There are duplicates in this table. I want to identify duplicates. 

I know there are couple of ways to do it. 

1. group by 

2. partition 

But I dont feel these are the efficient solutions as there are 100 columns. Could let me know if there are any other solutions.

Thanks,

Veeresh

7 REPLIES
Ckp
Enthusiast

Re: Deleting/identifying duplicates in the table

Ideally a set table does not store duplicates.

But If your table is multiset with some indexes then you can try this

sel index1, index2...Count(Last Index Col)

from yourTable

group by 1,2,..

having Count(Last Index Col) >1

Ruc
Fan

Re: Deleting/identifying duplicates in the table

All 100 columns wont be PK.
You can use normal procedure to find duplicates:

Sel col1,count(*) from table1
Group by col1
Having count(*)>1

For deleting duplicates:
Delete from table1
Where col1 = (sel col1 from table1 qualify row_number() over (order by col1) >1)
Senior Apprentice

Re: Deleting/identifying duplicates in the table

To delete the Duplicate Rows the fastest/easiest way will be an Insert/Select into a new SET table.

Either drop the old table & rename or delete the old an re-insert.

 

Of course, then you need to modify you load job to avoid Duplicate Rows in the future. 

Ruc
Fan

Re: Deleting/identifying duplicates in the table

I agree.. But if someone doesnt have access to drop or create table, in that case we can use the row_number query.
Senior Apprentice

Re: Deleting/identifying duplicates in the table

How is this Delete supposed to work for Duplicate Rows?
Ruc
Fan

Re: Deleting/identifying duplicates in the table

I think so..i have seen on many threads telling to use qualify but none shows actual query.
Please correct me if i am wrong with the query.
Senior Apprentice

Re: Deleting/identifying duplicates in the table

You can use QUALIFY when you need to filter for rows violating the Primary Key (or another group of columns), but a Duplicate Row has nothing unique to base the Delete on it.