Removing Cross Duplicates

Database

Removing Cross Duplicates

Dear TD admins, my problem is very similar to the one described on http://community.teradata.com/t5/Database/Help-on-Below-SQL/m-p/31185, albeit my data has a far greater number of fields, combinations and unreadable cyrillic characters that make the row number function more difficult to execute a partition. Also my joint is on substr() of the first 3 letters of the first name and the first 3 letters of the last name. Basically what I want is to delete the duplicates on username_origin and username_target but preserving all the remaining rows: this is a sample of my data (made of thousands of rows):

Username_originFirstNameLastNameUsername_targetFirstNameLastName
Tmeacham3VICTORIAMEACHVictor1VICTORIAMEACH
Victor1VICTORIAMEACHTmeacham3VICTORIAMEACH
Vewis9VICTORLEWVictoria0VICTORLEW
Victoria0VICTORLEWVewis9VICTORLEW

The above table is a truncated, reduced version of my data. I want to delete only those rows having cross duplicates between username_origin and username_target.

Trying out the suggested solution:

SyntaxEditor Code Snippet

Qualify 

row_number() over (partition by case when username_origin < username_target then username_origin else username_target end,
                                                                      when username_origin < username_target then username_target else username_origin end
                                                                       order by username_origin ) =1

I kept receiving the error: "Failed 3706: expected something between the 'end' keyword and the 'when' keyword. Can someone please help me to find a solution?

 

Thanks
Al


Accepted Solutions
Highlighted
Teradata Employee

Re: Removing Cross Duplicates



Also my joint is on substr() of the first 3 letters of the first name and the first 3 letters of the last name.

 

Hi Algrasso,

 

You can probably remove your duplicates at the join level, if you have an Id or something by adding : T1.ID < T2.ID.

Thus, your row_number may be unnecessary or easier to write.

1 ACCEPTED SOLUTION
3 REPLIES

Re: Removing Cross Duplicates

The answer was to add the "case" keyword changing the code to this: 

SyntaxEditor Code Snippet

Qualify 

row_number() over (partition by case when username_origin < username_target then username_origin else username_target end,
                                case when username_origin < username_target then username_target else username_origin end
                                                                       order by username_origin ) =1     

Although I copied the snippet verbatim from the proposed solution on the aforementioned link, I should have figured the error myself. Apologies for wasting your time here. If there is an alternative solution though, I would really appreciate to get to know what that is.. Thanks! 

Highlighted
Teradata Employee

Re: Removing Cross Duplicates



Also my joint is on substr() of the first 3 letters of the first name and the first 3 letters of the last name.

 

Hi Algrasso,

 

You can probably remove your duplicates at the join level, if you have an Id or something by adding : T1.ID < T2.ID.

Thus, your row_number may be unnecessary or easier to write.

Re: Removing Cross Duplicates

Thanks Waldar. That works too and it is fater. But I prefer the previous solution which is keeping a greater number of rows where illegal characters are encountered.