Teradata REGEX, UDF Query

Database

Teradata REGEX, UDF Query

Hello

Need help on REGEX_REPLACE and UDF.

Requirement: Need to find for various credit card, SSN and driver licence patterns in notes column and mask it using 'XXX' string. Need to process 300 million records for historical load and few millions everyday.

Thought of using sample code as below:

SELECT REGEXP_REPLACE('4567 8567 8987 3789',

'^[0-9]{4}[-, ]?[0-9]{4}[-, ]?[0-9]{4}[-, ]?[0-9]{4}[ ]{0,20}$',

  'XXX-XXX-XXXX'

) "REGEXP_REPLACE";

Question :

1) Since I have to match around 25+ patterns, can I use CASE or any other similar function which will allow me to match all 25 patterns and depending on the match, allow me the mask data. I may find multiple patterns in same column ( i.e maybe ssn and credit card number in same column etc)

2) Should I/Can I use UDF for this requirement? Performance will be impacted?

3) If any example for this kind UDF..that would really help.

Thank you ..

1 REPLY
Senior Apprentice

Re: Teradata REGEX, UDF Query

The easiest way is using nested REGEX_PREPLACE, one for each pattern.

Of course CPU usage will be high.

I don't know if lots of regexes in a C-UDF might be more efficient, maybe try the SQL way first :)