Pattern match and Extracting the words from column

Database
Enthusiast

Pattern match and Extracting the words from column

Hi,

I have a requirement to match the pattern and extract the word as explained below.

Ex: Column value  = "A_B_C.D_E_F NOT NULL HAVING COUNT EQ 0 SAMPLE A_B_C.H_I_J"

I need to match the pattern A_B_C and take the word after dot(.) i.e D_E_F and H_I_J

Expected output: D_E_F,H_I_J

Note :  The number of occurence of this pattern is also not constant.

Could somone suggest me a good way to achieve this.

Thanks,

Sri

6 REPLIES
Enthusiast

Re: Pattern match and Extracting the words from column

How are you providing this pattern as input? Is it given at the run time?

Enthusiast

Re: Pattern match and Extracting the words from column

Hi ,

Pattern is constant and we can hard code the pattern in query directly..

Im expecting something from regex but I couldn't  find the proper documents for the same.

Senior Apprentice

Re: Pattern match and Extracting the words from column

Hi Sri,

this is a RegEx to find the word after 'A_B_C.', the 1st group searches for the pattern without adding it to the result and the 2nd group extracts the following word:

REGEXP_SUBSTR(x, '(?<=A_B_C.)(.+?\b)')

But this only works for the first occurence. If the maximum number is known you could use

REGEXP_SUBSTR(x, '(?<=A_B_C.)(.+?\b)',1,2)
REGEXP_SUBSTR(x, '(?<=A_B_C.)(.+?\b)',1,3)

to get the 2nd, 3rd and so on.

If the number of occurences is high you might use REGEXP_SPLIT_TO_TABLE to get all as rows and then XMLAGG to concat them back.

Enthusiast

Re: Pattern match and Extracting the words from column

Hi Dieter, 

Thanks for your solution. but the number of occurence is not constant in my case so as per your suggestion,  I have to use REGEXP_SPLIT_TO_TABLE and XMLAGG.

But could you please let me know is there any books or documents to learn these functions? 

Senior Apprentice

Re: Pattern match and Extracting the words from column

Hi Sri,

there's no Teradata-specific documentation on regex, but there are many books and online resources available, simply google for "regular epression". There are different dialects for regex, but Teradata usually follows the Perl variant (Perl Compatible Regular Expressions = PCRE).

XMLAGG is documented in the XML manual, hopefully it's installed on your system.

What's your Teradata release?

WITH cte (inkey,x) AS 
(SEL 1, 'A_B_C.D_E_F NOT NULL HAVING COUNT EQ 0 SAMPLE A_B_C.H_I_J' AS x)
SELECT
outkey
,OREPLACE(XMLAGG(REGEXP_SUBSTR(token, '(.+?\b)')
ORDER BY tokennum) (VARCHAR(10000))
,' ',',')
FROM
TABLE(REGEXP_SPLIT_TO_TABLE(cte.inkey, cte.x, '(A_B_C.)', 'c')
RETURNS(outkey INTEGER, tokennum INTEGER, token VARCHAR(100) CHARACTER SET UNICODE)) AS dt
GROUP BY 1

There's currently an open Tech Alert (TA3744) on failing queries using REGEXP_SPLIT_TO_TABLE, so this might fail on your real data with "Failure 9134 Unexpected failure while attempting to convert UTF8 to UTF16".

Enthusiast

Re: Pattern match and Extracting the words from column

Hi Dieter,

That sounds great. Let me check PCRE syntax to understand better.

We have TD14

Thanks a lot for all your help !!