how to avoid the silent truncation when using tpt load (fastload)

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Highlighted
Enthusiast

how to avoid the silent truncation when using tpt load (fastload)

Hi, Every one I am try to load a file by tpt load (fastload). There are some columns which length is larger then the table definition. But the recode can be loaded successfully with no error. I hope the recordes like that could be figure out and I could redefine the table. Now I have to define a temporary table having the largest length where column and then insert the records to the real table. It costs a lot of time. Is there any other suggestions? Thank you for your help.

Accepted Solutions
Teradata Employee

Re: how to avoid the silent truncation when using tpt load (fastload)

The session character set needs to match the encoding of the input data. Since ASCII is a (small) subset of UTF-8, changing may or may not be workable.

If it is known that all the data will be single-byte ASCII characters - plain letters, numbers, common punctuation - then switching to ASCII would be fine (even if the target column is UNICODE and not LATIN). If you must allow for other characters, you might consider converting the source file from UTF-8 to UTF-16 (e.g. using iconv command on Linux) so there will be a fixed relationship between number of characters in the database and number of bytes in the schema. 

 

In Teradata transaction mode (which TPT always uses), string truncation happens silently on the database side. If you used ANSI transaction mode in some other client tool, oversize fields would be rejected. For example, in BTEQ you could ".SET SESSION TRANSACTION ANSI" prior to logging on, and use the (rather verbose) MESSAGES=RECORDNUMBERS option on ".IMPORT" so you could determine the relative record number of any rejected rows. 

1 ACCEPTED SOLUTION
10 REPLIES 10
Teradata Employee

Re: how to avoid the silent truncation when using tpt load (fastload)

You can't change the Load operator behavior, but if you define the input schema with the correct max lengths (or let TPT build the schema for you) then the Data Connector operator can reject records with oversize values.

Enthusiast

Re: how to avoid the silent truncation when using tpt load (fastload)

Hi,Fred Thank you for your reply. There are some oversize records in data file, So I can not define the correct max lengths . How can I let TPT build the schema for me?
Teradata Employee

Re: how to avoid the silent truncation when using tpt load (fastload)

You can specify RecordErrorFileName for the Data Connector operator to log the records that don't fit the schema to a flat file for inspection / analysis.

 

I was making the assumption that you currently have an explicit schema definition with field sizes too large for the target table to hold (otherwise I would expect the job to fail rather than loading truncated data). If the input is expected to match the target table (sizes, order of fields, etc.) then TPT can infer the schema based on the table definition. Or you could explicitly generate a schema based on a table or query. See the TPT User Guide "Advanced Scripting Strategies" section for more information.

Enthusiast

Re: how to avoid the silent truncation when using tpt load (fastload)

Hi, Fred I am afraid I haven't explain my problem clearly. My Data Connector operator cannot reject records with oversize values now. → I define a tabel with one column varchar(8)( applied by the customer, but maybe wrong) and the data file is 'XY0VX83AAAA'. It is loaded as 'XY0VX83A' I expect it should not be loaded into the table but be outputed into the error data file. ← This is the shema DEFINE SCHEMA source_file ( string_type VARCHAR(32) ); This is the parameter VARCHAR DirectoryPATH = @MyFolder1, VARCHAR ArchiveDirectoryPATH = @MyFolder2, VARCHAR FileName = @InputFileName, VARCHAR RecordErrorFileName =@RecordErrorFileName, VARCHAR Format = 'Delimited', VARCHAR TextDelimiter = '|', VARCHAR QuotedData = 'Optional', VARCHAR AcceptMissingColumns = 'N', VARCHAR AcceptExcessColumns = 'N', VARCHAR TruncateColumnData= 'N', INTEGER SkipRows = 1, VARCHAR SkipRowsEveryFile ='Y' Thank you for your help.
Teradata Employee

Re: how to avoid the silent truncation when using tpt load (fastload)

The Data Connector operator relies on the schema field length, not the target column length.

 

So currently, the schema can hold 32 bytes but the target column can hold only 8 characters. The input value is 11 characters, so it fits in the schema and is accepted by the Data Connector operator and passed to the Load operator. But the Load operator quietly truncates the value to 8 characters.

 

Provided you are using a single-byte session character set (e.g. ASCII) or double-byte character set (e.g. UTF16), just reduce the schema length to 8 bytes (or 16 bytes for double-byte character set) so the oversize values can be detected and rejected by the Data Connector operator.

 

(With UTF8 session character set you might have a problem, since the number of bytes per character can vary.)

Enthusiast

Re: how to avoid the silent truncation when using tpt load (fastload)

Thank you for your suggestion and I will test it agian. But is there any other effect if I change the session set from UTF-8 to ASCII? The UTF-8 is setted by another experienced consultant , So......
Teradata Employee

Re: how to avoid the silent truncation when using tpt load (fastload)

The session character set needs to match the encoding of the input data. Since ASCII is a (small) subset of UTF-8, changing may or may not be workable.

If it is known that all the data will be single-byte ASCII characters - plain letters, numbers, common punctuation - then switching to ASCII would be fine (even if the target column is UNICODE and not LATIN). If you must allow for other characters, you might consider converting the source file from UTF-8 to UTF-16 (e.g. using iconv command on Linux) so there will be a fixed relationship between number of characters in the database and number of bytes in the schema. 

 

In Teradata transaction mode (which TPT always uses), string truncation happens silently on the database side. If you used ANSI transaction mode in some other client tool, oversize fields would be rejected. For example, in BTEQ you could ".SET SESSION TRANSACTION ANSI" prior to logging on, and use the (rather verbose) MESSAGES=RECORDNUMBERS option on ".IMPORT" so you could determine the relative record number of any rejected rows. 

Enthusiast

Re: how to avoid the silent truncation when using tpt load (fastload)

Thank you for your reply. We have already test the ASCII character setting in TPT script. But the UNICODE[Japanese charavter] is also loaded , but [??] with no error. We do not know whether there is any UNICODE characters in our data file or not . So, We expect the records not fitting the data file wiill alert us . We also test the UTF-8, and reduce the schema as the same as ASCII mode. And it works. Can we use that settings. We are worried about any other bug . Is there any suggestion? Thank you very much
Teradata Employee

Re: how to avoid the silent truncation when using tpt load (fastload)

The recommended approach is to specify the actual encoding of the source. But the Data Connector can only validate input byte count against the schema, which isn't directly tied to character count if the encoding is UTF-8. As a simple example, señorita is 8 characters (in both Unicode and Teradata LATIN) but would require 9 bytes in UTF-8. 

 

When session character set is ASCII: the driver not only accepts x'00'-x'7F' ASCII values, but in effect assumes any bytes in the x'80-x'FF' range are Teradata LATIN extended characters. That means all byte values are potentially valid, so no error is thrown. But any multi-byte special character encoding is not recognized; each byte is interpreted as an individual character.