REGEXP_REPLACE in Teradata SQLA

Database
Highlighted
Enthusiast

REGEXP_REPLACE in Teradata SQLA

Hi,

am trying to remove the date part at the end of comment data in text in a column.The date at the end is like '21FEB','2004/21FEB' and '21FEB18'.

I used RegExp_Replace(X, ' [0-9]{1,2}[A-Z]{3}$','',1,1,'i') to remove '21FEB' at the end of a comment like "ADDED EMAIL ADDRESS FROM CTCE DATA 21FEB".

I used RegExp_Replace(X, ' [0-9]{1,4}/[0-9]{1,2}[A-Z]{3}$','',1,1,'i') to remove '2004/21FEB' at the end of a comment like "REX AUTO REFUND 2004/21FEB".

I used REGEXP_REPLACE(X, ' [0-9]{1,2}[A-Z]{3}[0-9]{1,2}$','',1,1,'i') to remove '21FEB18' at the end of a comment like 'CANCELED DUPLICATED SEGMENTS P 21FEB18'.

Is there a way to combine all these 3 different variants of the REGEXP_REPLACE function in a single REGEXP_REPLACE expression so that it applies to the different comment formats in the data?

Thanks!


Accepted Solutions
Enthusiast

Re: REGEXP_REPLACE in Teradata SQLA

I added the JAN|FEB logic insted of [A-Z]{3} just to be on the safer side.Both of them elimated the date part in the comment :)

Tags (1)
1 ACCEPTED SOLUTION
6 REPLIES
Junior Contributor

Re: REGEXP_REPLACE in Teradata SQLA

You can use the pipe character '|' for multiple alternatives:

RegExp_Replace(X, ' ([0-9]{1,2}[A-Z]{3}|[0-9]{1,4}/[0-9]{1,2}[A-Z]{3}|[0-9]{1,2}[A-Z]{3}[0-9]{1,2})$','',1,1,'i')
Enthusiast

Re: REGEXP_REPLACE in Teradata SQLA

Thanks Sir! It worked!

Enthusiast

Re: REGEXP_REPLACE in Teradata SQLA

I added the JAN|FEB logic insted of [A-Z]{3} just to be on the safer side.Both of them elimated the date part in the comment :)

Tags (1)
Enthusiast

Re: REGEXP_REPLACE in Teradata SQLA

I already have  a REGEXP_REPLACE like this below for almost all possibe text I could find in my column:

REGEXP_REPLACE(TRIM(REGEXP_REPLACE(TRIM(REGEXP_REPLACE(TRIM(REGEXP_REPLACE((TRIM(REGEXP_REPLACE(SUBSTR(PNR_REMRK_TXT,9,999), ' *([0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)|[0-9]{1,4}/[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)|[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[0-9]{1,2}|[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC).|[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[0-9]{1,4}|[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) [0-9]{1,4})$','',1,1,'i'))), 'SEE \*......', 'SEE *ZZZZZZ')), 'SEE\*......', 'SEE*ZZZZZZ')), 'FROM \*......', 'FROM *ZZZZZZ')), 'DUPLICATED WITH ' '......', 'DUPLICATED WITH ' 'ZZZZZZ')

The last few values remaning in my data are like 'REVIEWED BY DATA ANALYST XXXXXXXXXXXX' and 'FF XX XXXXXXXXX'. I am trying to write a REGEXP_REPLACE for this to return just 'REVIEWED BY DATA ANALYST' and 'FF'. Just the 1st four words in the first example and the first word in the second example and ignore the rest.

I tried the below code:

REGEXP_SUBSTR(REGEXP_REPLACE(TRIM(REGEXP_REPLACE(TRIM(REGEXP_REPLACE(TRIM(REGEXP_REPLACE((TRIM(REGEXP_REPLACE(SUBSTR(PNR_REMRK_TXT,9,999), ' *([0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)|[0-9]{1,4}/[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)|[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[0-9]{1,2}|[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC).|[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[0-9]{1,4}|[0-9]{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) [0-9]{1,4})$','',1,1,'i'))), 'SEE \*......', 'SEE *ZZZZZZ')), 'SEE\*......', 'SEE*ZZZZZZ')), 'FROM \*......', 'FROM *ZZZZZZ')), 'DUPLICATED WITH ' '......', 'DUPLICATED WITH ' 'ZZZZZZ'), '(REVIEWED BY DATA ANALYST|FF)', 1, 1, 'i')

I just replaced the 'X' in the REGEXP_SUBSTR code i.e. RegExp_Substr(X, '(REVIEWED BY DATA ANALYST|FF)', 1, 1, 'i') with the REGEXP_REPLACE code I had mentioned earlier. But doing this the code works only for data 'REVIEWED BY DATA ANALYST' and 'FF'  values and returns the other values as null.

 

Is there any other workaround on this?

 

Teradata Employee

Re: REGEXP_REPLACE in Teradata SQLA

You could continue the pattern of getting rid of the part you don't want. For example:

 

Match as two substrings and use the first substring as the replacement for the entire match:

 RegExp_Replace(X, '(REVIEWED BY DATA ANALYST|FF)(.*)','\1', 1, 1, 'i')

Use "look-behind" to check for a preceding pattern without considering that part of the matched string: 

RegExp_Replace(X, '(?<=REVIEWED BY DATA ANALYST).*|(?<=FF).*', '',1, 1, 'i')

Enthusiast

Re: REGEXP_REPLACE in Teradata SQLA

Thanks a lot Fred! The first code worked exactly for what I was looking for!