Extract strings based on conditions

General
Highlighted
Enthusiast

Extract strings based on conditions

 

Have a kind of an URL in which I have to extract a word after a particular string:

 

Example:

https://www.google.co.in/search/d1_lang=EN&d1_word=community%2522teradata page&ei=V6CYW-WGMMrUvATB9IDQCQ&q

 

Here, I would need to extract the string after d1_word but before ampersand and display the o/p as:

community teradata page

(%2522 should be ignored)


Accepted Solutions
Junior Contributor

Re: Extract strings based on conditions

You can extract using NVP and then remove the digits using a regex:

RegExp_Replace(Nvp(url, 'd1_word', '&', '='), '%[0-9]*', ' ')

 

1 ACCEPTED SOLUTION
3 REPLIES
Junior Contributor

Re: Extract strings based on conditions

You can extract using NVP and then remove the digits using a regex:

RegExp_Replace(Nvp(url, 'd1_word', '&', '='), '%[0-9]*', ' ')

 

Enthusiast

Re: Extract strings based on conditions

Thank you. It works!

 

But, it also has some plus signs in between as below in few URLS:

1st url:d1_word=community%2522teradata page

2nd url: d1_word=community+teradata page+search&

 

Above query works perfectly fine when % and digits are involved ie., for 1st URl.

 

How do I remove plus signs from here in the same query? 

Can i refine the same query to include other conditions to remove + sign also?

 

 

Teradata Employee

Re: Extract strings based on conditions

Hi Phoenix,

 


How do I remove plus signs from here in the same query? 

Can i refine the same query to include other conditions to remove + sign also? 


Sure, you can use the or functionnality of regexp which is done via the pipe character.

As the plus character is a significant one, you have to escape it.

 

Starting from @dnoeth formulas, this regular expression will do the job :

RegExp_Replace(Nvp(url, 'd1_word', '&', '='), '%[0-9]*|\+', ' ')

If you want to add more character, just pipe and add them to the expression.