Levenstein Distance Algorithm

Analytics
Enthusiast

Levenstein Distance Algorithm

Levenstein Distance Algorithm is a famous algorithm used for measuring the amount of difference between two sequences (i.e., the so called edit distance).
The Levenshtein distance between two strings is given by the minimum number of operations needed to transform one string into the other.

This algorithm uses Arrays - which is the main challenge to be applied in Teradata Stored Procedure

Can you please anyone give any code to use this in Teradata Stored Procedure/Function or SQL!
Oracle PL/SQL Code is available - that uses Arrays. How do I implement Arrays in Teradata?

FYI : [url=http://en.wikipedia.org/wiki/Levenshtein_distance][/url]
6 REPLIES
Junior Contributor

Re: Levenstein Distance Algorithm

I don't know if this is what you actually need, but there's a UDF implementing an editdistance funtion at
http://www.teradata.com/DownloadCenter/Topic9232-133-1.aspx

Dieter
Enthusiast

Re: Levenstein Distance Algorithm

Does any one have Levenshtein Algoritham code for Tera data ??

Junior Contributor

Re: Levenstein Distance Algorithm

Edit Distance and N-Gram string UDFs

In TD14 there's a built-in EditDistance UDF based on the Damerau-Levenshtein Distance algorithm.

Enthusiast

Re: Levenstein Distance Algorithm

thanks for your quick response. Do we have something using sql ???

Enthusiast

Re: Levenstein Distance Algorithm

Hi Folks,

I am trying to modify the  sql code for levenstein distance algoritham to teradata sql. 

Can some one please look at the below code and suggest me the changes. I am getting error at class aggregate interim size.

Appreciate your quick response & Thanks for your help in advance.

______________SQL CODE_________________


CREATE

FUNCTION

edit_distance

(s1 varchar(3999), s2 varchar(3999))

RETURNS INTEGER CAST FROM  int

LANGUAGE SQL

SPECIFIC EDIT_DISTANCE

CLASS AGGREGATE

 (s1 varchar(3999), s2 varchar(3999)

PARAMETER STYLE  SQL

DETERMINISTIC

RETURNS NULL ON NULL INPUT

specific _ edit_distance

BEGIN

DECLARE @s1_len int , @s2_len int

DECLARE @i int, @j int, @s1_char nchar, @c int, @c_temp int

DECLARE @cv0 varbinary(8000), @cv1 varbinary(8000)

SELECT

@s1_len = LEN(@s1),

@s2_len = LEN(@s2),

@cv1 = 0x0000,

@j = 1, @i = 1, @c = 0

WHILE @j <= @s2_len

SELECT @cv1 = @cv1 + CAST(@j AS binary(2)), @j = @j + 1

WHILE @i <= @s1_len

BEGIN

SELECT

@s1_char = SUBSTRING(@s1, @i, 1),

@c = @i,

@cv0 = CAST(@i AS binary(2)),

@j = 1

WHILE @j <= @s2_len

BEGIN

SET @c = @c + 1

SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j-1, 2) AS int) +

CASE WHEN @s1_char = SUBSTRING(@s2, @j, 1) THEN 0 ELSE 1 END

IF @c > @c_temp SET @c = @c_temp

SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j+1, 2) AS int)+1

IF @c > @c_temp SET @c = @c_temp

SELECT @cv0 = @cv0 + CAST(@c AS binary(2)), @j = @j + 1

END

SELECT @cv1 = @cv0, @i = @i + 1

END

RETURN @c

END

Enthusiast

Re: Levenstein Distance Algorithm

Thanks guys...there is in builtfunction for editdistance function in version 14 of teradata...thanks once again...