Have a table with 1 billion records in which almost close to half the records are of exact row duplicates. I need to select one record from the set of duplicates and load into another table.The query that i used is as below. The DIRECT_ID column used in the order by clause is a sequence generated id and hence i was picking the most recent genarated id for that set of duplicates records. But this is taking too much of time.
Appreciate for suggestions..
Insert into New_Consumer
If you have access to DDLS,
Create a temp table as set table - Table 1
Insert the rows from the source to "Table 1"
Insert the rows from "Source" minus "Table1" to target table based on DIRECT_ID.
i.e. Source - 1,2,1,3
Table 1 - 1,2,3
Target table - Source - Table1 - 1
This would work but not sure about time.
Apart from DIRECT_ID is there any other columns which you can use as a differentiating column between two records. For ex: date or timestamp column etc..
When you are inserting to New_Consumer table, do you want to select all distinct records from CONSUMER_TABLE table. Is this a one time activity or if it's a daily activity why you need to load all records?
There's no way to avoid the GROUP/PARTITION BY all columns.
Clean your data once and next time simply don't insert duplicates, your load process should take care of this.
There should be a logical Primary Key on your table, but that's not a sequence like DIRECT_ID.
We do not have any other column that differentiates the records apart from the DIRECT_ID column.
Yes we want to select all the distinct records from consumer table and will be a one time activity..
Unfortunately there is no there key which differentiates the record and which i admit is poor design. Now living wih what we have, we are trying to do a one time activity of loading disitnct records into a new table from the table that has duplicates. Have used the same query as i have pasted above with partition and i did a explain too.. the explain is not showing any time that i might take to complete the process.. Any help or advice?
No Worries Yuva.. Any other ways? I have tried using Partition and group by but both are taking too long..Any help?