Load delimited flat file using TPT : a column contains 15K Unicode characters

Tools & Utilities
Enthusiast

Load delimited flat file using TPT : a column contains 15K Unicode characters

Hi All,

We have a delimited flat file and one of the column can have upto 15000 Unicode charaters.
We can't use ASCII charater set as we have Unicode data.
And if we use UTF8, we can't define a column with size more than VARCHAR(32000).
(If we define say, 33K, we get the Error- A column or character expression is larger than the max size).
But we will have to define the column as VARCHAR(45000) [15000x3=45000].

How can we load data this file?

Thanks,
Sankalp

4 REPLIES
Teradata Employee

Re: Load delimited flat file using TPT : a column contains 15K Unicode characters

There is nothing wrong with VARCHAR(45000).

As long as the total size of the rest of the columns does not cause the entire record to exceed 64000 bytes, the data should be able to be loaded.

 

-- SteveF
Enthusiast

Re: Load delimited flat file using TPT : a column contains 15K Unicode characters

Hi Steve,

 

Entire record do not exceed 64000. Please check the TPT Script.

I have tested with below data file and TPT script. The script given below is throws the error-

LOAD_OPERATOR: TPT10508: RDBMS error 3798: A column or character expression is larger than the max size.

 

If I just replace UTF8 with ASCII in first line of the script, it loads the data successfully. Please suggest.

 

Teradata Parallel Transporter Version: 14.10.00.10

 

Data file content:

-------------------

10|Hello world1
11|Hello world2
20|Hello world3

 

Script:

USING CHARACTER SET UTF8
DEFINE JOB FILE_LOAD
DESCRIPTION 'Load a Teradata table from a file'
(
DEFINE SCHEMA TEST_TBL
(
col1 VARCHAR(11)
,col2 VARCHAR(45000)
);
DEFINE OPERATOR DDL_OPERATOR
TYPE DDL
ATTRIBUTES (
VARCHAR TdpId = 'EDWUAT' ,
VARCHAR UserName = 'TEST_STG_USER' ,
VARCHAR UserPassword = 'stg_pswd' ,
VARCHAR ErrorList = '3807'
);
DEFINE OPERATOR FILE_READER
TYPE DATACONNECTOR PRODUCER
SCHEMA TEST_TBL
ATTRIBUTES
(
VARCHAR PrivateLogName='dataconnector_log',
VARCHAR DirectoryPath = '/home/devo/sankalp',
VARCHAR FileName = 'sample_data_file',
VARCHAR OpenMode = 'Read',
VARCHAR Format = 'Delimited',
VARCHAR TextDelimiter = '|',
VARCHAR EscapeTextDelimiter = '\',
VARCHAR DateForm = 'ANSIDATE'
);
DEFINE OPERATOR LOAD_OPERATOR
TYPE LOAD
SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName = 'load_log',
VARCHAR TdpId = 'EDWUAT' ,
VARCHAR UserName = 'TEST_STG_USER' ,
VARCHAR UserPassword = 'stg_pswd' ,
VARCHAR TargetTable = 'test_stg.test_tbl',
VARCHAR LogTable = 'test_stg.test_tbl_lg',
VARCHAR ErrorTable1 = 'test_stg.test_tbl_et',
VARCHAR ErrorTable2 = 'test_stg.test_tbl_uv',
VARCHAR IndicatorMode = 'N',
VARCHAR Format = 'Unformatted',
INTEGER MaxDecimalDigits = 38,
INTEGER MaxSessions = 1,
VARCHAR DateForm = 'ANSIDATE',
INTEGER ErrorLimit = 1,
VARCHAR WildcardInsert = 'Y'
);
STEP drop_table
(
APPLY
('DROP TABLE test_stg.test_tbl;'),
('DROP TABLE test_stg.test_tbl_lg;'),
('DROP TABLE test_stg.test_tbl_et;'),
('DROP TABLE test_stg.test_tbl_uv;'),
('CREATE TABLE test_stg.test_tbl (col1 INTEGER, col2 VARCHAR(15000) CHARACTER SET UNICODE);')
TO OPERATOR (DDL_OPERATOR);
);
STEP insert_data
(
APPLY
('INSERT INTO test_stg.test_tbl;')
TO OPERATOR (LOAD_OPERATOR[1])
SELECT * FROM OPERATOR(FILE_READER[1]);
);
);

 

Teradata Employee

Re: Load delimited flat file using TPT : a column contains 15K Unicode characters

We know about this issue and the Teradata database engineers are looking into it.

 

-- SteveF
Teradata Employee

Re: Load delimited flat file using TPT : a column contains 15K Unicode characters

Sorry or the delay; I finally have the information from the database engineering team:

 

Apparently the 3798 error is the expected behavior, but some of our customers have never liked the error. The result of this DR was the addition of DBSControl Internal 173 (NoCharTruncErrInUsing) . Setting this internal dbs control flag allows up to 64000 bytes under Unicode based session charsets in a USING clause.  Truncation occurs after 32000 characters in a Unicode session charset.

 

Thus, the way I interpret this is, by default when specifying a client session character set of UTF8, the user is restricted to 32000 bytes in the USING clause of the DML statement.

 

So, to me, it looks as if (using a client session character set of UTF8), a user could export 15000 characters (45000 bytes) from the database, but not load that data back in, because of the USING clause restrictions.

-- SteveF