Question about "regexp_replace"

Database
Enthusiast

Question about "regexp_replace"

Hello to everyone,



When I execute this sentence:

  • select REGEXP_REPLACE('XXX A1 A2 XXX', '.*( )(A[0-9]).*', '\2', 1, 1, 'i');

I get the second "(A[0-9])" expression, that is, "A2", instead of the first one ("A1").

I would like to know why this is like that and how to get "A1".

Thanks in advance and best regards

Juan
8 REPLIES
Enthusiast

Re: Question about "regexp_replace"

REGEXP_REPLACE('XXX A1 A2 XXX', '(.* )(A[0-9])( )(A[0-9]).*', '\2', 1, 1, 'I');

Enthusiast

Re: Question about "regexp_replace"

Thanks yuvaevergreen, but the thing is that I don't know if I'll have two "(A[0-9])" expressions... I could have 3, 4... or just 1.

I wonder why I have to say explicitly that there has to be another "(A[0-9])" expression to make the sentence choose the first one. It's like the regular expression motor follows a right to left order, isn't it?

  • select REGEXP_REPLACE('XXX A1 XXX', '(.* )(A[0-9])(.*)', '\2', 1, 1, 'i'); --> Returns "A1"
  • select REGEXP_REPLACE('XXX A1 A2 A3 XXX', '(.* )(A[0-9])(.*)', '\2', 1, 1, 'i'); --> Returns "A3"
  • select REGEXP_REPLACE('XXX A1 A2 A3 XXX A4 A5', '(.* )(A[0-9])(.*)', '\2', 1, 1, 'i'); --> Returns "A5"
  • select REGEXP_REPLACE('XX A0 XXX A1 A2 A3 XXX A4 A5', '(.* )(A[0-9])(.*)', '\2', 1, 1, 'i'); --> Returns "A5"

Even something like:

  • select REGEXP_REPLACE('XXX A1 A2 A3 XXX', '(.* )(A[0-9])(.*)( A[0-9])*(.*)', '\2', 1, 1, 'i'); --> Returns "A3" (why not "A1"?)

So isn't there a way to say that I want the first expression (keeping in mind that there could be just 1) that gets the pattern regardless of what I have before and after?

Thanks and regards

Juan
Junior Contributor

Re: Question about "regexp_replace"

Your regex is greedy, you need to change it to  '(.*? )(A[0-9])(.*)'

Btw, if you really need to return the first An only why don't you switch to REGEXP_SUBSTR?

SELECT REGEXP_SUBSTR('XX A0 XXX A1 A2 A3 XXX A4 A5', '(A[0-9])', 1, 1, 'i');

Enthusiast

Re: Question about "regexp_replace"

IF you are sure of the first qualifier, then the regex mentioned by Deiter will work perfect. But, better option would be to extract the first occurence and then work upon it.

Enthusiast

Re: Question about "regexp_replace"

Deiter, you are a genius! I had been looking for it for a long while... and I just needed a "?". Thanks!

I know that I could use REGEXP_SUBSTR, but sometimes, I need to get a subexpression in the expression and, at the same time, to delimite the context of the stuff I want to extract... and in Teradata is pending the implementation (as far as I know) of the call with the subexpression parameter:
  • REGEXP_SUBSTR(salida(4), '( )(KM )(([[:digit:]]+)((.)([[:digit:]]+))?)( )', 1, 1, 'i', 3);
But now, I can use the REGEXP_REPLACE to obtain the same (even in my original code in Oracle, adding ^ and $... for Teradata that's not neccesary):
  • REGEXP_REPLACE(salida(4), '^.*?( )(KM )(([[:digit:]]+)((.)([[:digit:]]+))?)( ).*$', '\3', 1, 1, 'i' );
I guess that respecting to performance REGEXP_SUBSTR and REGEXP_REPLACE must be more or less the same, right?

Thanks again and regards

Juan
Junior Contributor

Re: Question about "regexp_replace"

I wouldn't expect much difference, most of the CPU is probably used by the search.

There are some differences between Oracle & Teradata regex, Teradata supports both lookahead (?=...) and lookbehind (?<=...) to add a non-matching group.

Enthusiast

Re: Question about "regexp_replace"

Ok, thanks!

Juan

Re: Question about "regexp_replace"

Dieter,

Could you please provide the link for documentation of regular expression usage in Teradata?

Thanks,

Ajit