Unicode block matching with REGEXP

Database
Highlighted
Fan

Unicode block matching with REGEXP

Hey,

 

Is there anything similar to unicode block matching like python in teradata?

 

Example:

 

In python I can do:

re.findall('[\u2600-\u26FF]',"★ WOW ★")

 

This will give me the star chracters from the string.

 

I tried the same in Teradata with escaping the frontslashes

 

SELECT REGEXP_INSTR('★ WOW ★','[\\u2600-\\u26FF]') 

 

but it doesn't work as expected.

Tags (2)

Accepted Solutions
Junior Contributor

Re: Unicode block matching with REGEXP

The syntax is a bit different as Teradata uses PCRE regular expressions:

SELECT RegExp_Instr('★ WOW ★','[\x{2600}-\x{26FF}]')

But as you want to return a star you should use

SELECT RegExp_Substr('★ WOW ★','[\x{2600}-\x{26FF}]')

This will return only the 1st occurance, to get all stars you can remove all non-star characters:

SELECT RegExp_Replace('★ WOW ★','[^\x{2600}-\x{26FF}]')

 

1 ACCEPTED SOLUTION
2 REPLIES 2
Junior Contributor

Re: Unicode block matching with REGEXP

The syntax is a bit different as Teradata uses PCRE regular expressions:

SELECT RegExp_Instr('★ WOW ★','[\x{2600}-\x{26FF}]')

But as you want to return a star you should use

SELECT RegExp_Substr('★ WOW ★','[\x{2600}-\x{26FF}]')

This will return only the 1st occurance, to get all stars you can remove all non-star characters:

SELECT RegExp_Replace('★ WOW ★','[^\x{2600}-\x{26FF}]')

 

Fan

Re: Unicode block matching with REGEXP

Thanks @dnoeth that was pretty much the only one I didn't try yesterday :)