split address column into multiple columns

Tools & Utilities

split address column into multiple columns

Hi, I'm new to Teradata. I have below sample address column needs to split into multiple columns.The problem is this data is not in standard format.

Example:

Addres_column
street_num:1234,street_nm:Main St,city:Atlanta,State:Georgia

street_nm:bay st,city:San Francisco

City:Knoxville,Street_num:3456,State:TN

State:TX,Street_nm:vince blvd

I need to split this column into Street_num, street_nm, city and state columns. Can someone please help me on this? We are using TD 15

5 REPLIES 5
Teradata Employee

Re: split address column into multiple columns

One possibility is to use the tags in a regular expression for each element

SELECT REGEXP_SUBSTR_GPL(address_column,'street_num:(.+?)(,|$)',1,1,'i',1) as street_num,
REGEXP_SUBSTR_GPL(address_column,'street_name:(.+?)(,|$)',1,1,'i',1) as street_name, ...

The (undocumented) "GPL" suffixed version of the function accepts an additional argument that allows you to specify you only want to return the value for a particular "capturing group" rather than the full match string. Or to use REGEXP_SUBSTR, you'd need a somewhat less efficient pattern that uses look-ahead and look-behind:

 

'(?<=street_num:)(.+?)((?=,)|$)'

Ambassador

Re: split address column into multiple columns

NVP is a function made for extracting values from a string of name-value-pairs:

NVP(Lower(address_column), 'street_num', ',', ':') -- LOWER because NVP is case-sensitive

 

 

Re: split address column into multiple columns

Thanks a lot Fred. Worked perfectly. I just added STRTOK function to pickup only left part of the delimiter.

SELECT trim(STRTOK(REGEXP_SUBSTR_GPL(address_column,'street_num:(.+?)(,|$)',1,1,'i',1),',',1)) as street_num

 

Re: split address column into multiple columns

Thanks dnoeth. Worked like charm.

Teradata Employee

Re: split address column into multiple columns

I don't think the STRTOK or TRIM do anything useful in your example. With this pattern the REGEXP_SUBSTR_GPL result already excludes the comma, and if you wanted to just remove an optional trailing comma, TRIM(TRAILING ',' FROM ...) would be a better choice. There are also no trailing spaces in the test data.

You might want to CAST the result to a shorter VARCHAR, depending on what you do with the result.

NVP is a good option if you have an acceptable way to deal with the case-specific issue - e.g. all lowercase or all uppercase for the result is fine, or maybe you know there are only a few variants of the keyword name (e.g. state and State but never sTATE) and so can look for those individually and COALESCE.