Regular expression with irregular result

Database

Regular expression with irregular result

I'm getting an unexpected result from regexp_substr().  I have an asterisk-delimited string that needs to be parsed (this is X12 EDI data).  regexp_substr works terrific, unless there are successive delimiters.  For example,

select regexp_substr('PR*2*100.8**45*69.12','(?<=\*).*?(?=\*|$)',1,2)

returns 100.8 as expected (second occurrence of leading and trailing asterisks).

However,

select regexp_substr('PR*2*100.8**45*69.12','(?<=\*).*?(?=\*|$)',1,4)

does not return the expected 45.  Instead I get NULL, apparently because TD did not count the consecutive delimiters as an occurrence.

Is this a bug or a feature?  What am I missing?

InfoKey InfoData

1 VERSION 14.10.04.03

2 RELEASE 14.10.04.03

3 LANGUAGE SUPPORT MODE Standard

Thanks

7 REPLIES
Junior Contributor

Re: Regular expression with irregular result

IMHO this result is correct (if you remove a space '(?<=\*).*?(?=\*|$)').

But why thinking about a RegEx if you get the same requested result in an easier way?

If consecutive delimiters should be counted as one you better switch to STRTOK, additionally this should require less resources: 

select strtok('PR*2*100.8**45*69.12','*',4)

Re: Regular expression with irregular result

Thank, Dieter!  I was ignorant of strtok(), but it still doesn't provide the answer I expect.  '45' is the 5th token in the string, not the fourth.  I need a function that considers a zero-length string between two delimiters to be a valid result.

For example, 

select strtok('SV1*HC:99213:::::OFFICE VISIT ESTAB*137*UN*1***1:2:3:4','*',6)

returns '1:2:3:4', when it needs to return NULL, and

select strtok('SV1*HC:99213:::::OFFICE VISIT ESTAB*137*UN*1***1:2:3:4','*',8)

returns NULL, when it needs to return '1:2:3:4'

Re: Regular expression with irregular result

I've since discovered (on other programming forums) that strtok() simply treats consecutive delimiters as one by design.  I think I'll have to go back to my original workaround.

Junior Contributor

Re: Regular expression with irregular result

I got you wrong, of course STRTOK doesn't work in your case.

When I tried your expression outside of TD it returned the expected data, but Oracle always returned NULL, as it doesn't allow lookahead/lookback.

TD REGEXP_SUBSTR seems to be stuck when there are multiple delimiter, there's no more match at all, all return an empty string. You might open an incident on that, but I'm not sure if it's a bug or just based on the regex dialect TD implements...

As a workaround you can use the following, don't do a lookahead on the 2nd '*', add it to the result instead and then trim it:

rtrim(regexp_substr('PR*2*100.8**45*69.12','(?<=\*).*?(\*|$)',1,4), '*')

Re: Regular expression with irregular result

Hi

I needed something similar and take your conversation as beginning. What I suggest is that in your evaluation concatenate a '*'|| added at the beginning of string to evaluate.

SELECT '*'||'SV1*2HC:99213:::::OFFICE VISIT ESTAB*3137*4UN*51***81:2:3:4' AS VALOR
/* BAD
,REGEXP_SUBSTR(VALOR,'[^*]+',1,1) val1
,REGEXP_SUBSTR(VALOR,'[^*]+',1,2) val2
,REGEXP_SUBSTR(VALOR,'[^*]+',1,3) val3
,REGEXP_SUBSTR(VALOR,'[^*]+',1,4) val4
,REGEXP_SUBSTR(VALOR,'[^*]+',1,5) val5
,REGEXP_SUBSTR(VALOR,'[^*]+',1,6) val6
,REGEXP_SUBSTR(VALOR,'[^*]+',1,7) val7
,REGEXP_SUBSTR(VALOR,'[^*]+',1,8) val8
--END BAD CODE */

/* GOOD*/
,
RTRIM(REGEXP_SUBSTR(VALOR,'(?<=\*).*?(\*|$)',1,1), '*') valor1
,
RTRIM(REGEXP_SUBSTR(VALOR,'(?<=\*).*?(\*|$)',1,2), '*') valor2
,
RTRIM(REGEXP_SUBSTR(VALOR,'(?<=\*).*?(\*|$)',1,3), '*') valor3
,
RTRIM(REGEXP_SUBSTR(VALOR,'(?<=\*).*?(\*|$)',1,4), '*') valor4
,
RTRIM(REGEXP_SUBSTR(VALOR,'(?<=\*).*?(\*|$)',1,5), '*') valor5
,
RTRIM(REGEXP_SUBSTR(VALOR,'(?<=\*).*?(\*|$)',1,6), '*') valor6
,
RTRIM(REGEXP_SUBSTR(VALOR,'(?<=\*).*?(\*|$)',1,7), '*') valor7
,
RTRIM(REGEXP_SUBSTR(VALOR,'(?<=\*).*?(\*|$)',1,8), '*') valor8
/* GOOD*/
,
CHAR_LENGTH(VALOR) - CHAR_LENGTH(OTRANSLATE(VALOR, '*','')) counts





Regards,
acifuentes

Junior Contributor

Re: Regular expression with irregular result

Hi acifuentes,

yep, that's helpful.

Btw, if you actually need to extract all values as individual columns you might also use CSVLD instead, although the syntax is a bit complicated (it's a table function):

WITH cte AS (
SELECT 'SV1*2HC:99213:::::OFFICE VISIT ESTAB*3137*4UN*51***81:2:3:4' AS VALOR)
SELECT *
FROM TABLE (CSVLD(cte.valor, '*', '')
RETURNS (p1 VARCHAR(100) CHARACTER SET UNICODE,
p2 VARCHAR(100) CHARACTER SET UNICODE,
p3 VARCHAR(100) CHARACTER SET UNICODE,
p4 VARCHAR(100) CHARACTER SET UNICODE,
p5 VARCHAR(100) CHARACTER SET UNICODE,
p6 VARCHAR(100) CHARACTER SET UNICODE,
p7 VARCHAR(100) CHARACTER SET UNICODE,
p8 VARCHAR(100) CHARACTER SET UNICODE)) AS T1

Re: Regular expression with irregular result

Excellent

Thanks Dieter!


Regards,

acifuentes