Regular Expression On Teradata 14.0

Database
Enthusiast

Regular Expression On Teradata 14.0

Hi All,

I've been working for more than 8 years in Oracle 10g, 11g & worked significant queries on Regular expressions in various scenario using SQL. It is real handy if you know how to use it & can reduce lots of pain with single SQL. And, the performance will be better compared to the total effort to achieve the same functionalities by using multiple SQL queries or PL/SQL Procedures.

Last couple of years, I'm working on Teradata. And, on some occassion - I was expecting features like these, where I can easily manipulate data with regular expression. I'm pretty excited when I heard that Teradata also introduced Regular Expression from Version 14.0.

As a result, I tried all those features that I think can be handy & useful for various scenarios & followings are the successful queries that I get. There are two occasion, where Teradata partially able to manipulate those strings. I've checked the latest Teradata Manual. However, unable to find those solution. So, I'm expecting other forum members can contribute here in order to make this thread useful for every one of us. And, I'll post here as soon as I get some answers on these partial conversions.

For better understanding, I've provided the actual column value & after transformation value of that column in the output. That will help us to grasp it easily - I guess. :)

Case 1,

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*)$','\1 \2\3') AS COL_VAL;

COLA COL_VAL
---------------- ----------------------------------------
SatyakiDe Satyaki De

Case 2,

select regexp_replace('919047242526','^([[:digit:]]{2})([[:digit:]]{10})','+\1 \2') COL_VAL;

COLA COL_VAL
------------ ---------------
919047242526 +91 9047242526

Case 3,

select regexp_replace('+++C','^([[:punct:]]{2})([[:punct:]]{1})(.*)$','\1\3') COL_VAL;

COLA COL_VAL
---- -----
+++C ++C

Case 4,

select initcap(regexp_replace(regexp_substr('satyaki.de@mail.com','[^@]+'),'(.*)(\.)(.*)','\1 \3')) COL_VAL;

COLA COL_VAL
-------------------------------- --------------------------------------------------
satyaki.de@mail.com Satyaki De

Case 5,

select regexp_replace('100011001','([[:digit:]]{3})([[:digit:]]{2})([[:digit:]]{4})','XXX-XX-\3') as COL_VAL;

COLA COL_VAL
---------------- --------------------
100011001 XXX-XX-1001

Case 6,

select regexp_replace('123456789','([[:digit:]]{3})([[:digit:]]{3})([[:digit:]]{3})','\3.\2.\1') as COL_VAL;

COLA COL_VAL
--------- ---------------
123456789 789.456.123

In all the cases, you can pass any column name as argument & hence the entire column value can parse accordingly.

Hope, My solution will help others & at the same time - I'm expecting others to post their relevant regular expression scripts here. So, everyone can get most of their queries related to Regular Expression at a single place. :)

Really appreciate your time to read this thread.

Regards.

Satyaki De.

17 REPLIES
Teradata Employee

Re: Regular Expression On Teradata 14.0

Very nice post Satyaki. You can also evolve this post into an article and publish it on Develop Exchange.

Good sharing.

Enthusiast

Re: Regular Expression On Teradata 14.0

RE was introduced earlier since v13.xx(not from v14). if you have a good grasp RE, it's easy to apply it to TD re-funtions or other rdmbs re-functions as long as re-functions introduced.

Enthusiast

Re: Regular Expression On Teradata 14.0

Nice post Satyaki, As Adeel advised, you should publish it on Teradata Blogs :)

Khurram
Enthusiast

Re: Regular Expression On Teradata 14.0

Hi Jinli,

I don't know from where do you get the information that Regular Expression introduced in 13.0. And, please specify exact version say 13.0.0.1 not 13.x.x. Prior to that TD has UDF functions to serve the same.

Please find the following post in this forum only ->

http://forums.teradata.com/forum/extensibility/regexp-like

Anyway, If you have any link that suggests - Please provide that here. So, that we can update that information in this thread as well.

Regards.

Satyaki De.

Enthusiast

Re: Regular Expression On Teradata 14.0

Hi Admin,

I've noticed that in the provided queries, some places there are some spaces introduced. Not sure - how that happens. As a result - we have to modify these queries by removing those spaces.

For example, 

Case 2,

select regexp_replace('919047242526','^([[:digit:]]{2})([[:digit:]]{10})','+\1 \2') COL_VAL;

Removed the space from 2nd digit notation, which has spaces after ':'.

Similarly, Case 3, Case 5 & Case 6 needs to modify.

Case 3,

select regexp_replace('+++C','^([[:punct:]]{2})([[:punct:]]{1})(.*)$','\1\3') COL_VAL;

Case 5,

select regexp_replace('100011001','([[:digit:]]{3})([[:digit:]]{2})([[:digit:]]{4})','XXX-XX-\3') as COL_VAL;

Case 6,

select regexp_replace('123456789','([[:digit:]]{3})([[:digit:]]{3})([[:digit:]]{3})','\3.\2.\1') as COL_VAL;

So, any extra space may not produce desired result. And, needs to pay attention into these small details. Unfotunately, I cannot edit here. Hence, I've to paste all these correct entries here again. Sorry for that.

And, I've tested all these queries in the following two versions -

select * from dbcinfo;

InfoKey InfoData
1 VERSION 14.10.00.02
2 RELEASE 14.10.00.02
3 LANGUAGE SUPPORT MODE Standard

InfoKey InfoData
1 VERSION 14.10.01.05
2 RELEASE 14.10.01.04
3 LANGUAGE SUPPORT MODE Standard

Hope, this will give you much more clarity. :)

One more thing, I would like to clarify here - my intention is to describe more features about these regexp_(like/substr/instr/replace) functions.

Regarding version 13,

Let us check whether they have these regexp functions or not -

SELECT * FROM dbc.dbcinfo;

InfoKey InfoData
1 VERSION 13.00.00.15
2 RELEASE 13.00.00.15
3 LANGUAGE SUPPORT MODE Standard

SELECT * FROM dbc.dbcinfo;

InfoKey InfoData 1 VERSION 13.10.07.12 2 RELEASE 13.10.07.12 3 LANGUAGE SUPPORT MODE Standard 

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;
$
*** Failure 3706 Syntax error: expected something between '(' and the string 'S' keyword.
Statement# 1, Info =35
*** Total elapsed time was 1 second.

Hope this will help. :)

Regards.

Satyaki De

Enthusiast

Re: Regular Expression On Teradata 14.0

Thanks Adeel & Khurram for the suggestion.

Yes. I'll prepare that & share over there, too.

Regards.

Satyaki De

Enthusiast

Re: Regular Expression On Teradata 14.0

Hi All,

 

It is observed that Case 1 provided here may not work in later TD Releases properly. Hence, you can rewrite that as follows -

SELECT regexp_replace('SatyakiDe','([[:lower:]]{1,})([[:upper:]]{1,})','\1 \2') AS COL_VAL;

COLA COOKED_COLA
---------------- ----------------------------------------
SatyakiDe Satyaki De

Also, one more useful query you can find are as shown below -

 

Case 7,

SELECT regexp_replace('satyaki9de0loves3to8work2on2sql0and2bi6tools1','[^0-9]+','',1,0,'i') AS DER_VAL;

COLA DER_VAL
1 satyaki1de0loves3to8work2on2sql0and2bi4tools1 1038220241

As you can see, all the characters have filtered out from the string & only numbers are kept here. These sorts of queries are very useful in lots of different business scenarios as well.

 

Regards.

 

Satyaki De.

Enthusiast

Re: Regular Expression On Teradata 14.0

Due to some technical issue, I'm not able to post my next topic in a single post. :( So, I'm splitting it into couple of posts & hopefully able to demonstrate some additional information with you guys.

 

Now, Lets see some other functionality.

 

REGEXP_SIMILAR has similar functionalities like REGEXP_LIKE in Oracle.

 

Let's see couple of such cases -

 

Lets prepare the table with some dummy data -

SELECT * FROM dbc.dbcinfo;

InfoKey InfoData
1 VERSION 14.10.01.05
2 RELEASE 14.10.01.04
3 LANGUAGE SUPPORT MODE Standard

CREATE MULTISET VOLATILE TABLE TEST_T1
(
COL1 VARCHAR(10)
)
ON COMMIT PRESERVE ROWS;

INSERT INTO TEST_T1 VALUES('456')
;INSERT INTO TEST_T1 VALUES('123x')
;INSERT INTO TEST_T1 VALUES('x123')
;INSERT INTO TEST_T1 VALUES('y')
;INSERT INTO TEST_T1 VALUES('+789')
;INSERT INTO TEST_T1 VALUES('-789')
;INSERT INTO TEST_T1 VALUES('159-')
;INSERT INTO TEST_T1 VALUES('-1-');

Lets check the data now -

SELECT *
FROM TEST_T1;

COL1
1 123x
2 456
3 x123
4 +789
5 -789
6 y
7 159-
8 -1-

Continue .....

Enthusiast

Re: Regular Expression On Teradata 14.0

Let's look into the various scenarios now -

Case 1 (Returns Mixed Numbers, Signed Numbers & Non Numbers),

SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'^[0-9]+$','c')=0;

COL1
1 123x
2 x123
3 +789
4 -789
5 y
6 159-
7 -1-

Case 2 (Returns Only Unsigned Positive Numbers),

SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'^[0-9]+$','c')=1;

COL1
1 456

Case 3 (Returns All Numbers including Positive, Negative & unsigned),

SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'^[+-]?[0-9]+[+-]?$','c')=1;

COL1
1 456
2 +789
3 -789
4 159-
5 -1-

Case 4 (Returns Only Non Numbers i.e. Characters),

SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'[^0-9]+','c')=1;

COL1
1 y

Hope this will give you some additional idea. :)

 

My objective is to provide basic information to my friends. So, that they can write better SQL in TD while migrating from other popular databases or new developer in TD can get a flavor of this powerful feature & exploit them in all the positive aspect & apply them properly. :D

 

Really appreciate your time to read this post.

 

Regards.

 

Satyaki De.