Dropping duplicates in SQL

Database
Enthusiast

Dropping duplicates in SQL

Hi,

We are migrating to Teradata a Datamart that was previously in Oracle. The system unloaded data from Oracle (1 Tb weekly), then processed them using C/perl and then loaded the results to another Oracle instance (200Gb). Now we are moving to Teradata were the source tables and the destination tables are in the same TD server. Part of the migration is to drop the C/perl code and move to SQL in order to improve the performance reducing load/unload times.

We are suffering problems with the SQLs because in the source data some rows are duplicated, the process of dropping the duplicates was made in C previously. I give you an example:

Source data:

COL1 COL2 COL3 COL4
AAAA 1111 RRRR SSSS
AAAA 2222 RRRR SSSS
BBBB RRRR $$$$ eeeeee

Theorically the relationship between COL1 and COL2 is 1-1 but the system who feeds the data is full of errors and one COL1 can have 2 values in col2.

We make: SELECT DISTINCT COL1, COL2 FROM Table and then the result is joined to another table. We cannot have duplicates in the join as the second table is very large (3k million registers).

In the previous C, the program discarded one of the duplicates by random. We are doing now an SQL which always takes the MAX(col2) as the good record, but we wonder if TD has any clause in the SQL to drop bay random.

Thanks in advance.

2 REPLIES
Senior Apprentice

Re: Dropping duplicates in SQL

col1 is supposed to be unique?
One of those rows should be returned?
AAAA 1111 RRRR SSSS
AAAA 2222 RRRR SSSS

This will return a row randomly:
select
col1,col2,col3,col4
from tab
qualify
row_number() over (partition by col1 order by col1) = 1

Dieter
Enthusiast

Re: Dropping duplicates in SQL

Thank you Dieter, the SQL worked fine.