I have table with 100 columns and and has 10milllion records. There are duplicates in this table. I want to identify duplicates.
I know there are couple of ways to do it.
1. group by
But I dont feel these are the efficient solutions as there are 100 columns. Could let me know if there are any other solutions.
Ideally a set table does not store duplicates.
But If your table is multiset with some indexes then you can try this
sel index1, index2...Count(Last Index Col)
group by 1,2,..
having Count(Last Index Col) >1
To delete the Duplicate Rows the fastest/easiest way will be an Insert/Select into a new SET table.
Either drop the old table & rename or delete the old an re-insert.
Of course, then you need to modify you load job to avoid Duplicate Rows in the future.