Dealing with Duplicates.

Database
Enthusiast

Dealing with Duplicates.


I have a large table which contains some duplicate records (on accnt_num ). I do not just want to just dedupe the table as work need to done on those duplicate accounts.

Ideally I want to create a second table holding just the duplicate records (Again ideally I would like the duplicate record to be the 2nd record added to the original table—by load_date. )

Could somebody please advise me of the best way of tackling this problem.

With thanks

Bob
3 REPLIES

Re: Dealing with Duplicates.

Hello Bob.
That is an interesting problem. Here is some psedo-code.
It might point you in the right direction. I'm sure there
are various ways to approach this. But, this is my first idea.

one sql statement grabs every account number and it's earliest load date.
Something like:

DERIVED TABLE_A:

SELECT account_number, min(load_date), min(sequence_id)
FROM tbl_acct
GROUP BY account_number

another sql statement identifies which account numbers have duplicate values

DERIVED TABLE_B:

SELECT account_number
FROM tbl_acct
GROUP BY account_number
HAVING COUNT(*) > 1

INSERT INTO TABLE_DUPLICATES VALUES(

SELECT *
FROM tbl_acct
WHERE account_number IN(SELECT * FROM TABLE_B)
AND (account_number, load_date, sequence_id) NOT IN(SELECT * FROM TABLE_A) )

Enthusiast

Re: Dealing with Duplicates.

HERE IS A SIMPLE SOLUTION:
Test scenario created:
CREATE TABLE isen.account_details(acc_no VARCHAR(30) NOT NULL,FNAME VARCHAR(30),LNAME VARCHAR(30),ACC_TYPE VARCHAR(3) NOT NULL,REC_LOAD_DATE DATE);

INSERT INTO isen.account_details( '100','JEET','POTTER','CHQ',CURRENT_DATE-3);
INSERT INTO isen.account_details( '100','EMMA','POTTER','SAV',CURRENT_DATE);
INSERT INTO isen.account_details( '200','STEVE','NATGELLE','SAV',CURRENT_DATE);
INSERT INTO isen.account_details( '200','MIKE','NATGELLE','TFS',CURRENT_DATE-2);
INSERT INTO isen.account_details( '200','ROHIT','ALAWAT','TFS',CURRENT_DATE-3);
INSERT INTO isen.account_details( '300','NICHOLAS','SLIM','TFS',CURRENT_DATE);

TO INSERT INTO THE DUPILATE TABLE USE THE BELOW QUERY:

SELECT ACC_NO,FNAME,LNAME,ACC_TYPE,REC_LOAD_DATE FROM
(
SELECT
A.*
,ROW_NUMBER() OVER (PARTITION BY ACC_NO
ORDER BY REC_LOAD_DATE) AS ACCT_ORDR_NBR
FROM ISEN.ACCOUNT_DETAILS A ) DER
WHERE DER.ACCT_ORDR_NBR>1;

TO POPULATE THE TABLE WITH CURRENT ACCOUNTS USE THE FOLLOWING QUERY:

SELECT ACC_NO,FNAME,LNAME,ACC_TYPE,REC_LOAD_DATE FROM
(
SELECT
A.*
,ROW_NUMBER() OVER (PARTITION BY ACC_NO
ORDER BY REC_LOAD_DATE) AS ACCT_ORDR_NBR
FROM IPFRDEV_T.ACCOUNT_DETAILS A ) DER
WHERE DER.ACCT_ORDR_NBR=1;

Hope it helps!!
Enthusiast

Re: Dealing with Duplicates.

create table duplicates as (
select * from your_table
qualify rank() over (partition by account_number order by load_date ascending) >1
) with data primary index (account_number,load_date)