Identifying IDs for duplicate names

Database

Identifying IDs for duplicate names

Hi,

I've been a Teradata user on and off for a few years now but I'm a bit stuck with a specific query. Basically, the table (tens of millions of records) I am trying to query contains the following key fields:

-User_ID (unique)

-First_Name

-Last_Name

-Birth_Date

-Post_Code

Basically, there are many instances of duplicate persons (i.e. the same name, DOB and post code) with multiple User IDs; so the unique individual has multiple records in the database, but every record itself is unique. I can run a simple 'group by' query to pull the counts of all combinations of duplicate first names, last names, DOBs and Post Codes. What I now need is to have the full list of User IDs for every instance of a duplicate person. What would be the best way of going about doing this? I bet there is a really straight-forward way but somehow I'm not seeing it at the moment...

I hope the request is clear. I am happy to clarify if needed. I'd really appreciate any help with this.

3 REPLIES
Senior Apprentice

Re: Identifying IDs for duplicate names

Based on your explanation this might be what you need:

select 
min(User_ID)
over (partition by First_Name,Last_Name,Birth_Date,Post_Code) as minID
,t.*
from tab
qualify
count(*)
over (partition by First_Name,Last_Name,Birth_Date,Post_Code) > 1

It will return all rows for a group of duplicates with the same minID.

Dieter

Enthusiast

Re: Identifying IDs for duplicate names

You can aslo try below

SEL 
USER_ID
FROM tablename
WHERE
(FIRST_NAME,LAST_NAME,DOB,POST_CODE) IN
(SEL  FIRST_NAME,LAST_NAME,DOB,POST_CODE
     FROM tablename
     GROUP BY 1,2,3,4
     HAVING COUNT(*) > 1
);

Terasum

Re: Identifying IDs for duplicate names

Thanks for the responses guys. Much appreciated.