Performance of Regular Expression Function REGEXP_INSTR

Database
Enthusiast

Performance of Regular Expression Function REGEXP_INSTR

I've only recently learned of the regular expression functions within Teradata. I have some use cases where I have to pull transaction data based on patterns and obviously regex is great for that. I am curious about the performance of REGEXP_INSTR compared against using something like the following:

 

SELECT

  col1, col2, ..., coln

FROM

  table1

WHERE

  col1 LIKE ANY ('Pattern1 __/__%', '%Pattern1 __/__%', ... '%Patternx%')

 

Clearly, using regex will likely be less code and allow more flexibility than the wildcards in Teradata. Is there significant performance lift in using something like this instead:

 

WHERE

  REGEXP_INSTR(col1,'Pattern_to_capture_all_cases',1,1,0,'i') >= 1


Accepted Solutions
Teradata Employee

Re: Performance of Regular Expression Function REGEXP_INSTR

The REGEXP_ functions can be CPU intensive, but there is a pretty standard way of determining this for yourself.

 

Create test queries and run them against a reasonable amount of test data using Query Banding to easiliy identify which is which:

SET QUERY_BAND = 'TestType=REGvsLIKE;Query=LIKE;' UPDATE FOR SESSION;
/* <query 1 - LIKE> */
SET QUERY_BAND = 'TestType=REGvsLIKE;Query=REG;' UPDATE FOR SESSION;
/* <query 2 - REG> */
SET QUERY_BAND = NONE FOR SESSION;

 

Once the DBQL tables have been updated (may be up to 10 minutes depending on the request volume, but this can be forced if you have the relevant access) run the following or (similar):

SELECT  QueryBand
        , StartTime
        , (FirstRespTime - StartTime) SECOND(4) AS ElapsedSeconds
        , TotalIOCount
        , AmpCPUTime
        , SpoolUsage (BIGINT)
        , NumResultRows (BIGINT)
        , EstProcTime
FROM    DBC.QryLogV
WHERE   UserName = USER
AND     QueryBand LIKE '%TestType=REGvsLIKE;%'
AND     StatementType = 'Select'
ORDER BY Starttime;

 

You can then compare for yourself using something that is quantitative and make a decision about its effectiveness/usability beating in mind any other factors such as resource capacity.

1 ACCEPTED SOLUTION
2 REPLIES
Teradata Employee

Re: Performance of Regular Expression Function REGEXP_INSTR

The REGEXP_ functions can be CPU intensive, but there is a pretty standard way of determining this for yourself.

 

Create test queries and run them against a reasonable amount of test data using Query Banding to easiliy identify which is which:

SET QUERY_BAND = 'TestType=REGvsLIKE;Query=LIKE;' UPDATE FOR SESSION;
/* <query 1 - LIKE> */
SET QUERY_BAND = 'TestType=REGvsLIKE;Query=REG;' UPDATE FOR SESSION;
/* <query 2 - REG> */
SET QUERY_BAND = NONE FOR SESSION;

 

Once the DBQL tables have been updated (may be up to 10 minutes depending on the request volume, but this can be forced if you have the relevant access) run the following or (similar):

SELECT  QueryBand
        , StartTime
        , (FirstRespTime - StartTime) SECOND(4) AS ElapsedSeconds
        , TotalIOCount
        , AmpCPUTime
        , SpoolUsage (BIGINT)
        , NumResultRows (BIGINT)
        , EstProcTime
FROM    DBC.QryLogV
WHERE   UserName = USER
AND     QueryBand LIKE '%TestType=REGvsLIKE;%'
AND     StatementType = 'Select'
ORDER BY Starttime;

 

You can then compare for yourself using something that is quantitative and make a decision about its effectiveness/usability beating in mind any other factors such as resource capacity.

Enthusiast

Re: Performance of Regular Expression Function REGEXP_INSTR

Thanks for the detailed response! I'm excited to learn about Query Banding and give it a try. If I run into any issues, I'll reply.