Hi, I'm new to Teradata. I have below sample address column needs to split into multiple columns.The problem is this data is not in standard format.
street_nm:bay st,city:San Francisco
I need to split this column into Street_num, street_nm, city and state columns. Can someone please help me on this? We are using TD 15
One possibility is to use the tags in a regular expression for each element
SELECT REGEXP_SUBSTR_GPL(address_column,'street_num:(.+?)(,|$)',1,1,'i',1) as street_num,
REGEXP_SUBSTR_GPL(address_column,'street_name:(.+?)(,|$)',1,1,'i',1) as street_name, ...
The (undocumented) "GPL" suffixed version of the function accepts an additional argument that allows you to specify you only want to return the value for a particular "capturing group" rather than the full match string. Or to use REGEXP_SUBSTR, you'd need a somewhat less efficient pattern that uses look-ahead and look-behind:
NVP is a function made for extracting values from a string of name-value-pairs:
NVP(Lower(address_column), 'street_num', ',', ':') -- LOWER because NVP is case-sensitive
Thanks a lot Fred. Worked perfectly. I just added STRTOK function to pickup only left part of the delimiter.
SELECT trim(STRTOK(REGEXP_SUBSTR_GPL(address_column,'street_num:(.+?)(,|$)',1,1,'i',1),',',1)) as street_num
I don't think the STRTOK or TRIM do anything useful in your example. With this pattern the REGEXP_SUBSTR_GPL result already excludes the comma, and if you wanted to just remove an optional trailing comma, TRIM(TRAILING ',' FROM ...) would be a better choice. There are also no trailing spaces in the test data.
You might want to CAST the result to a shorter VARCHAR, depending on what you do with the result.
NVP is a good option if you have an acceptable way to deal with the case-specific issue - e.g. all lowercase or all uppercase for the result is fine, or maybe you know there are only a few variants of the keyword name (e.g. state and State but never sTATE) and so can look for those individually and COALESCE.