Splitting delimited columns to seperate columns

General
Enthusiast

Splitting delimited columns to seperate columns

I have an column in a table which has values as below,

 

abcd~efgh~hijk~lmno~pqrs~tuvw

ab12~ab34~ab56~ab78~ab90~ab01

 

I wanted to insert above delimited column into another table with multiple columns(col1 to n) as below,

 

col1    col2    col3  col4   col5  col6

abcd   efgh   hijk    lmno  pqrs  tuvw

ab12   ab34  ab56 ab78  ab90 ab01

 

I could see examples for splitting column to rows. But wanted to split into multiple columns. Please help.


Accepted Solutions
Teradata Employee

Re: Splitting delimited columns to seperate columns

Is the "n" fixed (6?) or you at least have some known upper bound, which would be the number of columns in the result?

Many approaches are possible (INSTR/SUBSTRING, REGEXP_SUBSTR, CSVLD, STRTOK, ...).

 

For example: SELECT STRTOK(inCol,'-',1),STRTOK(inCol,'-',2),...

1 ACCEPTED SOLUTION
13 REPLIES
Teradata Employee

Re: Splitting delimited columns to seperate columns

Is the "n" fixed (6?) or you at least have some known upper bound, which would be the number of columns in the result?

Many approaches are possible (INSTR/SUBSTRING, REGEXP_SUBSTR, CSVLD, STRTOK, ...).

 

For example: SELECT STRTOK(inCol,'-',1),STRTOK(inCol,'-',2),...

Enthusiast

Re: Splitting delimited columns to seperate columns

Thanks fred..

Highlighted
Enthusiast

Re: Splitting delimited columns to seperate columns

Hi Fred,

 

It doesnt work when null values are present within the delimiters.

 

Example:

'~~1~2~s1'

 

when I give strtok(str,'~',1) , o/p is returned as 1. Instead I want blank space or null value as output

Teradata Employee

Re: Splitting delimited columns to seperate columns

That has 5 fields instead of six. If the number of fields is constant, you could use CSVLD, which does return NULL for leading / trailing / consecutive delimiters.

 

WITH Tbl as (SELECT str FROM myDB.mytbl)
SELECT * FROM TABLE (CSVLD(Tbl.str,'-','')
RETURNS (col1 varchar(10) character set unicode, col2 varchar(10) character set unicode, col3 varchar(10) character set unicode, col4 varchar(10) character set unicode, col5 varchar(10) character set unicode) as csvToCols

 

 

Another option:

For n>=2, REGEXP_SUBSTR(str,'-[^-]*((?=-)|$)',1,n-1) returns the n-th field with leading - delimiter included, which you can then easily remove, e.g. using TRIM or SUBSTRING.

Similarly you could use REGEXP_SUBSTR(str,'^[^-]*((?=-)|$)') to return the first field (with no leading delimiter this time).

This approach has the advantage that if n > (number of fields in the input string) then the REGEXP_SUBSTR will just return NULL.

Enthusiast

Re: Splitting delimited columns to seperate columns

I have my below table as say,Tablename: delim and the column name: param hold below values:

 

abcd~efgh~hijk~lmno~pqrs~tuvw

ab12~ab34~ab56~ab78~ab90~ab01

 

I ve given as,

sel * from delim(csvld(delim.param, '~','')

returns (col1 varchar(30) character set unicode,......,col 6 varchar(30) character set unicode)

) as t1;

 

Still, it doesnot work. Any syntax incorrect here?

Teradata Employee

Re: Splitting delimited columns to seperate columns

So "doesn't work" in this case means it throws a syntax error?

TABLE is a keyword

sel * from TABLE(csvld(delim.param, '~','')...

 

Enthusiast

Re: Splitting delimited columns to seperate columns

Thanks, now am getting an error "Character set(2) not supported".

 

Is set unicode mandate for csvld?

Junior Contributor

Re: Splitting delimited columns to seperate columns

WITH delim AS
(SEL 'ab12~ab34~ab56~ab78~ab90~ab01' AS param)
SEL * 
FROM TABLE(CSVLd(delim.param, '~','')
RETURNS (
col1 VARCHAR(30) CHARACTER SET Unicode,
col2 VARCHAR(30) CHARACTER SET Unicode,
col3 VARCHAR(30) CHARACTER SET Unicode,
col4 VARCHAR(30) CHARACTER SET Unicode,
col5 VARCHAR(30) CHARACTER SET Unicode,
col6 VARCHAR(30) CHARACTER SET Unicode)
) AS t1;

works for me :-)

 

Teradata Employee

Re: Splitting delimited columns to seperate columns

All three arguments and the returned values must have the same CHARACTER SET. String literals are always UNICODE; apparently delim.param is LATIN.

 

So within the CSVLD call you can either TRANSLATE(delim.param USING LATIN_TO_UNICODE) and leave the rest as is, or translate both literals UNICODE_TO_LATIN and change the RETURNS to specify LATIN.