Teradata SQL - Large String Concatenation

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.

Teradata SQL - Large String Concatenation

Hello-

 

I am currently working to convert a Microsoft T-SQL Statement into Teradata for use within ETL processes for building a Data Warehouse table and have been encountering issues due to the defined size of the columns that I'm concatenating.

 

The source table is defined as follows:

ID (Int)  TextColumn (VARCHAR(40000))  OriginalID (Int)

--------  ---------------------------  ----------------

1         Long Text 1                  1

2         Extra 1-A                    1

3         Extra 1-B                    1

4         Long Text 2                  4

5         Long Text 3                  5

6         Extra 3-A                    5

 

I need to be able to query this table to get the results:

OriginalID (Int) FinalTextColumn (VARCHAR(40000))  

---------------- ---------------------------  

1                Long Text 1 APPENDED: Extra 1-A APPENDED: Extra 1-B

4                Long Text 2

5                Long Text 3 APPENDED: Extra 3-A

 

 

While I know that this structure could yield infinitely sized strings within FinalTextColumn, in practice 40,000 characters are sufficient - we're willing to allow for this potential truncation.  Plus, given analysis of our source data:

SELECT MAX(LENGTH(TextColumn)) FROM SourceTable WHERE ID = OriginalID 

36,080

SELECT MAX(LENGTH(TextColumn)) FROM SourceTable WHERE ID <> OriginalID  

18,629

SELECT TOP 1 OriginalID, SUM(LENGTH(TextColumn))

FROM SourceTable

GROUP BY OriginalID

ORDER BY SUM(LENGTH(TextColumn)) DESC

37,199

 

I've attempted solutions to use XMLAGG as well as pivoting the data, however both solutions result in the "Failed [3798 : 42000] A column or character expression is larger than the max size." error

 

Attempt 1 (XMLAGG solution):

SELECT OriginalD,

     XMLAGG(CASE WHEN OriginalID <> ID THEN 'APPENDED: ' ELSE '' END || TextColumn ORDER BY ID) "FinalTextColumn"

FROM SourceTable

GROUP BY OriginalID

 

Attempt 2 (Sequence View + Rank-based pivot):

**Obviously this solution has issues too, as it limits to a fixed number of appended extras... but in practice (10 years) this has never been over 5 (6) one time.

REPLACE VIEW RankedText AS (

     Select OriginalID, ID, TextColumn, 

          ROW_NUMBER() OVER (PARTITION BY OriginalID ORDER BY ID) AS "TextRank"

     FROM SourceTable

);

 

SELECT OriginalID,

     MAX(CASE WHEN TextRank=1 THEN TextColumn ELSE '' END) ||

     MAX(CASE WHEN TextRank=2 AND TextColumn IS NOT NULL THEN " APPENDED: " || TextColumn ELSE '' END) ||

     ...

     MAX(CASE WHEN TextRank=10 AND TextColumn IS NOT NULL THEN " APPENDED: " || TextColumn ELSE '' END) as "FinalTextColumn"

FROM RankedText

GROUP BY OriginalID

 

I also tried attempt 2 returning each RankedText as an indiviual column - but receieved the error "[9804 : HY000] Response Row size or Constant Row size overflow."

 

Any thoughts or suggestions on this would be greatly appreciated!

Thanks!

 


Accepted Solutions
Highlighted
Teradata Employee

Re: Teradata SQL - Large String Concatenation

Pivot is probably the right approach for this. Character literals are CHARACTER SET UNICODE, so that's causing the database to attempt implicit translation of the LATIN column to UNICODE, which fails since the original column allows over 32000 characters.

 

You can explicitly translate the literals to LATIN (that should be sufficient for the "separate columns" option):

     MAX(CASE WHEN TextRank=1 THEN TextColumn ELSE translate('' using unicode_to_latin) END)

Or if you prefer to concatenate, then you will need to convert to CLOB also:

     CAST(MAX(CASE WHEN TextRank=1 THEN TextColumn ELSE translate('' using unicode_to_latin) END) AS CLOB) ||

 

1 ACCEPTED SOLUTION
3 REPLIES
Highlighted
Teradata Employee

Re: Teradata SQL - Large String Concatenation

Pivot is probably the right approach for this. Character literals are CHARACTER SET UNICODE, so that's causing the database to attempt implicit translation of the LATIN column to UNICODE, which fails since the original column allows over 32000 characters.

 

You can explicitly translate the literals to LATIN (that should be sufficient for the "separate columns" option):

     MAX(CASE WHEN TextRank=1 THEN TextColumn ELSE translate('' using unicode_to_latin) END)

Or if you prefer to concatenate, then you will need to convert to CLOB also:

     CAST(MAX(CASE WHEN TextRank=1 THEN TextColumn ELSE translate('' using unicode_to_latin) END) AS CLOB) ||

 

Teradata Employee

Re: Teradata SQL - Large String Concatenation

And since your database version appears not to support 1MB response rows, you'll need to use CLOB or else convince the database that the rows won't exceed 64K e.g. using SUBSTRING and/or CAST to VARCHAR. 

Re: Teradata SQL - Large String Concatenation

Thanks Fred.  This seems like it will work pretty well for my purposes.