regexp_substr in TD14.10

Database
Enthusiast

regexp_substr in TD14.10

Hi Gurus,

Hope all is well with you.

I am trying to solve an issue with parsing a text field based on a delimiter and with the same delimiter being part of the double-quotes. I am on DB version TD14.10 and I am trying to use a combination of regexp_substr and regexp_replace, but not have been able to parse the fields properly. Please help.

This is a sample record

a,b,c,"abc,def,dfg",m,n,"hi,how,are you",g,k

The output should be separate fields for separate columns.

a

b

c

abc,def,dfg

m

n

hi,how,are you

g

k

I tried using

SEL regexp_replace(regexp_replace(regexp_substr('a,b,c,"abc,def,dfg",m,n,"hi,how,are you",g,k','[^,]+',1,1),'"',''),'"','');

 but I am spliting the data within double-qiotes as well.

Any and all help would be appreciated.

Thank you.

6 REPLIES
Enthusiast

Re: regexp_substr in TD14.10

Try with a combination of  strtok_split_to_table. 

SELECT * FROM TABLE (strtok_split_to_table(1,'a,b,c,"abc,def,dfg",m,n,"hi,how,are you",g,k', ',/')

RETURNS (outkey integer, tokennum integer, token varchar(15)character set unicode) ) as d

strtok is also there.

Enthusiast

Re: regexp_substr in TD14.10

Thanks Raja. This does not work, as we are vertical pivoting taking the commas inside the double-qutes into consideration. I need the output to be something I can insert into multiple columns. The data within the double-quotes need to go into one column.

Enthusiast

Re: regexp_substr in TD14.10

Hi,

Check if this would satisfies your needs.

SEL oreplace(
    regexp_substr(
    'a,b,c," abc,def,dfg",m,n,"hi,how,are you",g,k',
    '((?:,|^)(?!,)(?:"(.*?(?<!\\))"|((?:.(?!,))*(?:.(?=,))?))|,())',1,1),
    '"',''),
    oreplace(
    regexp_substr(
    'a,b,c," abc,def,dfg",m,n,"hi,how,are you",g,k',
    '((?:,|^)(?!,)(?:"(.*?(?<!\\))"|((?:.(?!,))*(?:.(?=,))?))|,())',1,2),
    '"',''),
    oreplace(
    regexp_substr(
    'a,b,c," abc,def,dfg",m,n,"hi,how,are you",g,k',
    '((?:,|^)(?!,)(?:"(.*?(?<!\\))"|((?:.(?!,))*(?:.(?=,))?))|,())',1,3),
    '"',''),
    oreplace(
    regexp_substr(
    'a,b,c," abc,def,dfg",m,n,"hi,how,are you",g,k',
    '((?:,|^)(?!,)(?:"(.*?(?<!\\))"|((?:.(?!,))*(?:.(?=,))?))|,())',1,4),
    '"','');

Above SQL is splitted with four column and not as a reocrds. If this doesn't satisfies you, you could also have writing sp is one of the options.

Senior Apprentice

Re: regexp_substr in TD14.10

This works for your example:

select 'a,b,c," abc,def,dfg",m,n,"hi,how,are you",g,k' as x
,trim(both '"' from regexp_substr(x ,'("[^"]+"|[^,])',1,1,'c'))
,trim(both '"' from regexp_substr(x ,'("[^"]+"|[^,])',1,2,'c'))
,trim(both '"' from regexp_substr(x ,'("[^"]+"|[^,])',1,4,'c'))
,trim(both '"' from regexp_substr(x ,'("[^"]+"|[^,])',1,7,'c'))

But this looks like simple comma-delimited data, why don't you split the data in TPT before inserting them?

Enthusiast

Re: regexp_substr in TD14.10

Thank you Dieter and Vinod. This is exactly what I've wanted. Much appreciated.

Dieter,

Can you kindly tell me how you would skip the delimiter within the double-quotes in TPT. I am not aware of any parameter in TPT. Kindly help.

Thank you.

Senior Apprentice

Re: regexp_substr in TD14.10

Current rleases of TPT support quoted data, you need to use following attributes for the DataConnector:

SourceFormat   = 'Delimited'
TextDelimiter = ','
QuotedData = 'Optional'
OpenQuoteMark = '"'
CloseQuoteMark = '"'