We are migrating to Teradata a Datamart that was previously in Oracle. The system unloaded data from Oracle (1 Tb weekly), then processed them using C/perl and then loaded the results to another Oracle instance (200Gb). Now we are moving to Teradata were the source tables and the destination tables are in the same TD server. Part of the migration is to drop the C/perl code and move to SQL in order to improve the performance reducing load/unload times.
We are suffering problems with the SQLs because in the source data some rows are duplicated, the process of dropping the duplicates was made in C previously. I give you an example:
Theorically the relationship between COL1 and COL2 is 1-1 but the system who feeds the data is full of errors and one COL1 can have 2 values in col2.
We make: SELECT DISTINCT COL1, COL2 FROM Table and then the result is joined to another table. We cannot have duplicates in the join as the second table is very large (3k million registers).
In the previous C, the program discarded one of the duplicates by random. We are doing now an SQL which always takes the MAX(col2) as the good record, but we wonder if TD has any clause in the SQL to drop bay random.