Find Only Duplicates


Find Only Duplicates

Hi All,

I have very big table and it has around 250 columns and this table may have duplicates.

this table does not have any UPI or USI or event primary index also based on many columns.

Now my requirement to find the duplicate so tell me what is the best way to find the dups.

I am aware about GroupBY , ROW_NUMBER, UNION. But I am thinking these all will take more CPU and will hit performance.

And its not feasible to group by on all 250 columns or use row_number on these many columns.

So do we have another approach to finnd only duplicate.

Thanks in Advance..


Tags (1)

Re: Find Only Duplicates

I don't think you will have much other options the the big group by or the row_number() with big partition.

Sorted export and doing the check via an external program will also require huge resources...

That's the downside of of big multiset tables if you require unique rows and don't enforce it during ETL processes...


Re: Find Only Duplicates

Hi AG,

As not chk had been set while populating the data hence what ever you do can not prevent this senarion to happne in near future. Comming back to your question It have to be a all amp operation in order to find the duplicated in existing table entry, Hence preapare yourself for a BIT CPU use :). Ans is very simle in taht case just do a count using group by having with a greater Than one condition.

- Sankalp

Junior Contributor

Re: Find Only Duplicates

If you need to delete the duplicate rows the easiest way should be a SELECT DISTINCT into a new table, drop/rename.

But there should be a logical Primary Key for this table, so you might also have PK violations which are not duplicate rows and then you need COUNT to find them.

To avoid duplicates in future you should try to switch to MERGE.