Large Object Loading with Teradata Parallel Transporter

Tools
Tools covers the tools and utilities you use to work with Teradata and its supporting ecosystem. You'll find information on everything from the Teradata Eclipse plug-in to load/extract tools.
mvu
Teradata Employee

Large Object Loading with Teradata Parallel Transporter

Teradata PT supports loading Large Object (LOB) data into, and extracting LOB data from, Teradata Database tables. Large Objects are data types that can be Binary Large Objects (BLOBs) or Character Large Objects (CLOBs).

The SQL Inserter operator is the only consumer operator that can load LOB data into the Teradata Database. The SQL Selector operator is the only producer operator that can extract LOB data from the Teradata Database. Other operators (Export, Load, Update and Stream) cannot process LOB data. The Data Connector operator can read and write the inline LOB data.

Depending on how the LOB columns are defined in the Teradata PT schema, Teradata PT uses one of the following two methods to load LOB data into, or extract LOB data from, the Teradata Database:

  • Inline
  • Deferred

When loading LOB data, the SQL Inserter operator can employ both methods (inline and deferred) simultaneously to insert LOB columns into a Teradata table. However, when exporting LOB data from the Teradata server, the SQL Selector operator must extract all LOB columns in the same method (either inline or deferred).

Inline Method

In the inline method, data rows sent across the Teradata PT data stream must contain both non-LOB and LOB data. The inline BLOB and CLOB columns in the data rows are treated as similar as the VARBYTE and VARCHAR columns.

However, these BLOB and CLOB columns must have an 8-byte integer as their length indicator field instead of 2-byte length indicator as in the VARBYTE and VARCHAR columns. One restriction in using the inline method to transfer LOB data is that the entire data row should not be larger than 64,000 bytes since that is the row-size limit imposed by the Teradata database.

Deferred Method

In the deferred method, the Teradata PT does not send the actual LOB data across its data stream for LOB columns defined as DEFERRED BY NAME. Deferred LOB data is processed entirely separate from the non-LOB data. Data rows sent across the Teradata PT data stream contain only non-LOB data and LOB file locators. The locators point to the flat files containing actual data for the deferred LOB columns.

When extracting LOB data from the Teradata Database in the deferred mode, the three following attributes can be used in the SQL Selector operator:

  • LobDirectoryPath

It specifies the complete path name of an existing directory where all LOB data files will be written. The default value is the current working directory.

  • LobFileBaseName

It defines a character string that will be prefixed to the names of the LOB data files.

  • LobFileExtension

It specifies the extension for LOB data file names, e.g. ‘jpg’.

The LOB data files have formulaic names as follows:

<directory-name>/<file-basename>_<column-name>_<job-id>_p<#>_r<#>.<file-extension>

where

o <directory-name> is the value specified in the LobDirectoryPath attribute.

o <file-basename> is the value specified in the LobFileBaseName attribute.

o <column-name> is the column name defined in a source table.

o <job-id> is the name from the tbuild command line.

o <#> is an integer number generated internally by the Teradata PT

o <file-extension. Is the value specified in the LobFileExtension attribute

Defining LOB Data in a Teradata PT Schema

The syntax for defining LOB data, either as BLOB or CLOB, in a Teradata PT schema is shown below:

  • BLOB (lengthBytes) or CLOB (lengthBytes)

This syntax indicates that the Teradata PT load or extract job will use the inline method to transfer the LOB data between the Teradata PT and Teradata Database.

  • BLOB (lengthBytes) AS DEFERRED BY NAME or CLOB (lengthBytes) AS DEFERRED BY NAME

This syntax indicates that the LOB data is transferred using the deferred method. When LOB columns are defined as DEFERRED BY NAME in the schema, Teradata PT expects regular VARCHARs in place of deferred LOB columns in a data row. Each VARCHAR consists of a fully-qualified file name corresponding to a flat file that contains the actual data for a deferred LOB column. Teradata PT refers to these VARCHARs as LOB file locators.

Loading LOB Data into a Teradata Database

 

 



Data Flow Description

  • Data Connector operator reads a file with rows that contain both scalar data and LOB file locators.
  • BLOB and CLOB, which are stored in files, can be referenced by columns within a row.
  • Data Connector operator extracts rows from a file and passes them to the Inserter operator.
  • Files with BLOB and CLOB are read by the Inserter operator as part of reading a row.
  • Inserter operator inserts both scalar data and BLOB/CLOB data into Teradata.

Sample Script

DEFINE JOB LOADING_LOB_DATA
DESCRIPTION 'LOAD PRODUCT DEFINITION TABLE'
(

DEFINE SCHEMA SOURCE_SCHEMA
DESCRIPTION 'PRODUCT INFORMATION SCHEMA'
(
COL1_ID VARCHAR(38),
COL2_CLOB CLOB(50000) AS DEFERRED BY NAME,
COL3_BLOB BLOB(100000) AS DEFERRED BY NAME
);

DEFINE OPERATOR DDL_OPERATOR()
DESCRIPTION 'DDL Operator'
TYPE DDL
ATTRIBUTES
(
VARCHAR ARRAY ErrorList = ['3706','3803','3807'],
VARCHAR DateForm,
VARCHAR TraceLevel = ‘None’,
VARCHAR PrivateLogName = 'ddllog',
VARCHAR TdpId = @TdpId,
VARCHAR UserName = @UserName,
VARCHAR UserPassword = @UserPassword,
VARCHAR AccountID,
VARCHAR WorkingDatabase
);

DEFINE OPERATOR FILE_READER ()
DESCRIPTION 'DataConnector READER Operator'
TYPE DATACONNECTOR PRODUCER
SCHEMA SOURCE_SCHEMA
ATTRIBUTES
(
VARCHAR TraceLevel = ‘None’,
VARCHAR PrivateLogName = 'dtac_r_log',
VARCHAR DirectoryPath = @SourceDirectoryPath,
VARCHAR FileName = @SourceFileName,
VARCHAR OpenMode = 'Read',
VARCHAR IndicatorMode,
VARCHAR Format = @SourceFileFormat
);

DEFINE OPERATOR SQL_INSERTER ()
DESCRIPTION 'TERADATA INSERTER UTILITY'
TYPE INSERTER
INPUT SCHEMA *
ATTRIBUTES
(
VARCHAR TraceLevel = 'None',
VARCHAR PrivateLogName = 'ins_log',
VARCHAR Tdpid = @TdpId,
VARCHAR UserName = @UserName,
VARCHAR UserPassword = @UserPassword
);

STEP CREATE_SOURCE_TABLE
(
APPLY
('drop table ' || @SourceTable || ' ;'),
('create table ' || @SourceTable || ' , FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
col1_id VARCHAR(38),
col2_clob clob(50000),
col3_blob blob(100000)
) primary index (col1_id);')

TO OPERATOR ( DDL_OPERATOR () );
);

STEP LOADING_DATA_TO_SOURCE_TABLE
(
APPLY

('INSERT INTO ' || @SourceTable || ' values (
:COL1_ID, :COL2_CLOB, :COL3_BLOB);')

TO OPERATOR (SQL_INSERTER [1])

SELECT * FROM OPERATOR (FILE_READER ());
);

);

Exporting LOB Data from a Teradata Database

Data Flow Description

  • Selector operator reads rows from a Teradata table which contains both scalar and LOB data.
  • Selector operator retrieves BLOB and CLOB columns using LOB “locators”.
  • Selector operator generates LOB “identifiers” (file names) and writes the LOB data rows to the files. One data file represents data for one LOB column. The LOB data files are written to the directory specified in the LobDirectoryPath attribute. If the attribute value is not specified, the current working directory will be used.
  • Selector operator passes rows with non-LOB data and LOB identifiers to the Data Connector.
  • Data Connector operator writes data rows with non-LOB data and LOB identifiers to a file.

Sample Script

DEFINE JOB LOAD_LOB_DATA_FROM_ANOTHER_TABLE
DESCRIPTION 'LOAD PRODUCT DEFINITION TABLE'
(

DEFINE SCHEMA SOURCE_SCHEMA
DESCRIPTION 'PRODUCT INFORMATION SCHEMA'
(
COL1_ID VARCHAR(38),
COL2_CLOB CLOB(50000) AS DEFERRED BY NAME,
COL3_BLOB BLOB(100000) AS DEFERRED BY NAME
);

DEFINE OPERATOR SQL_SELECTOR ()
DESCRIPTION 'SQL OPERATOR'
TYPE SELECTOR
OUTPUT SCHEMA SOURCE_SCHEMA
ATTRIBUTES
(
VARCHAR TraceLevel = ‘None’,
VARCHAR PrivateLogName = 'sel_log',
VARCHAR ReportModeOn,
VARCHAR DateForm,
VARCHAR UserName = @UserName,
VARCHAR UserPassword = @UserPassword,
VARCHAR TdpId = @TdpId,
VARCHAR AccountID,
VARCHAR SelectStmt = 'select * from ' || @SourceTable || ' ;',
VARCHAR LobDirectoryPath = @LobDirectoryPath,
VARCHAR LobFileBaseName,
VARCHAR LobFileExtension

);

DEFINE OPERATOR FILE_WRITER ()
DESCRIPTION 'DataConnector WRITER Operator for TPT'
TYPE DATACONNECTOR CONSUMER
SCHEMA *
ATTRIBUTES
(
VARCHAR TraceLevel = 'None',
VARCHAR PrivateLogName = 'dtac_w_log',
VARCHAR DirectoryPath = @TargetDirectoryPath,
VARCHAR FileName = @TargetFileName,
VARCHAR OpenMode = 'write',
VARCHAR IndicatorMode = 'N',
VARCHAR Format = @TargetFileFormat
);

APPLY
TO OPERATOR (FILE_WRITER ())
SELECT * FROM OPERATOR (SQL_SELECTOR());

);

Copying LOB Data From One Teradata Database to Another

Data Flow Description

  • Selector operator reads rows from a Teradata table which contains both scalar and LOB data.
  • Selector operator retrieves BLOB and CLOB columns using LOB “locators”.
  • Selector operator generates LOB “identifiers” (file names) and writes the LOB data rows to the files. One data file represents data for one LOB column. The LOB data files are written to the directory specified in the LobDirectoryPath attribute. If the attribute value is not specified, the current working directory will be used.
  • Selector operator passes rows with non-LOB data and LOB identifiers to the Inserter operator.
  • Inserter operator inserts rows with non-LOB and LOB data to Teradata.
  • Inserter operator reads the LOB data from the flat files and inserts the LOB data into the target table. The LOB data files will be deleted immediately after they are inserted into the target table.

Sample Script

DEFINE JOB LOAD_LOB_DATA_FROM_ANOTHER_TABLE
DESCRIPTION 'LOAD PRODUCT DEFINITION TABLE'
(

DEFINE SCHEMA SOURCE_SCHEMA
DESCRIPTION 'PRODUCT INFORMATION SCHEMA'
(
COL1_ID VARCHAR(38),
COL2_CLOB CLOB(50000) AS DEFERRED BY NAME,
COL3_BLOB BLOB(100000) AS DEFERRED BY NAME
);

DEFINE OPERATOR DDL_OPERATOR()
DESCRIPTION 'TERABUILDER UPDATE OPERATOR'
TYPE DDL
ATTRIBUTES
(
VARCHAR ARRAY ErrorList = ['3706','3803','3807'],
VARCHAR DateForm,
VARCHAR TraceLevel = 'None',
VARCHAR PrivateLogName = 'ddllog',
VARCHAR TdpId = @TdpId,
VARCHAR UserName = @UserName,
VARCHAR UserPassword = @UserPassword,
VARCHAR AccountID,
VARCHAR WorkingDatabase
);

DEFINE OPERATOR SQL_SELECTOR ()
DESCRIPTION 'SQL OPERATOR'
TYPE SELECTOR
OUTPUT SCHEMA SOURCE_SCHEMA
ATTRIBUTES
(
VARCHAR TraceLevel,
VARCHAR PrivateLogName = 'sel_log',
VARCHAR ReportModeOn,
VARCHAR DateForm,
VARCHAR UserName = @UserName,
VARCHAR UserPassword = @UserPassword,
VARCHAR TdpId = @TdpId,
VARCHAR AccountID,
VARCHAR SelectStmt = 'select * from ' || @SourceTable || ' ;',
VARCHAR LobDirectoryPath = @LobDirectoryPath,
VARCHAR LobFileBaseName,
VARCHAR LobFileExtension

);

DEFINE OPERATOR SQL_INSERTER ()
DESCRIPTION 'TERADATA INSERTER UTILITY'
TYPE INSERTER
INPUT SCHEMA *
ATTRIBUTES
(
VARCHAR TraceLevel = 'None',
VARCHAR PrivateLogName = 'ins_log',
VARCHAR Tdpid = @TdpId,
VARCHAR UserName = @UserName,
VARCHAR UserPassword = @UserPassword
);

STEP SETUP_TARGET_TABLE
(
APPLY
('drop table ' || @TargetTable || ' ;'),
('create table ' || @TargetTable || ', FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
col1_id VARCHAR(38),
col2_clob clob(50000),
col3_blob blob(100000)
) primary index (col1_id);')

TO OPERATOR (DDL_OPERATOR ());
);

STEP LOADING_TARGET_TBL
(
APPLY

('INSERT INTO ' || @TargetTable || ' values (
:COL1_ID, :COL2_CLOB, :COL3_BLOB);')

TO OPERATOR (SQL_INSERTER [1])

SELECT * FROM OPERATOR (SQL_SELECTOR());
);

);

Sample Job Variable File

TdpId = ‘myTdpId’,
UserName = ‘ myUserId’,
UserPassword = ‘myPassword’,
TargetTable = ‘myTargetTable’,
SourceTable = ‘mySourceTable’,
LobDirectoryPath = ‘myLobDirectory’,
SourceDirectoryPath = ‘mySourceDirectory’,
SourceFileName = ‘mySourceFile’,
SourceFileFormat = ‘Delimited’,
TargetDirectoryPath = ‘mySourceDirectory’,
TargetFileName = ‘myTargetFile’,
TargetFileFormat = ‘Delimited’&nbsp;

25 REPLIES
Enthusiast

Re: Large Object Loading with Teradata Parallel Transporter

Hi,
If I am having a flat file which has a large lengthy character which exceeds the limits of Varchar, and I have to load it in to a table with CLOB datatype, Can u please explain the procedure to load the file to table.
Teradata Employee

Re: Large Object Loading with Teradata Parallel Transporter

If the character exceeds the length currently supported by our load tools (64000 bytes), then you need to treat the data as a "deferred mode" LOB.

All deferred mode LOB data needs to be in it own file. The non-LOB data would exist in another file, and in the column position of the LOB column, you would place the name of the LOB data file.
-- SteveF

Re: Large Object Loading with Teradata Parallel Transporter

In the inline method you wrote: "BLOB and CLOB columns must have an 8-byte integer as their length indicator field instead of 2-byte length indicator as in the VARBYTE and VARCHAR columns. One restriction in using the inline method to transfer LOB data is that the entire data row should not be larger than 64,000 bytes since that is the row-size limit imposed by the Teradata database"....

If an inline blob cannot be larger than 64k then why is 8-byte integer required instead of 2-byte integer for the length indicator field?
Teradata Employee

Re: Large Object Loading with Teradata Parallel Transporter

Good question.

When the LOB feature was designed, it was known that someday Teradata would expand the row size and the column size beyond the 64K limit.

Instead of designing the LOB feature with a 2-byte field length for the inline LOBs, and then having to make the changes when the row/column size changes in the future, they went ahead and required the field to have an 8-byte length specifier.

This way, customer jobs with inline LOBs will work without modifications when the row/column size limite increases.
-- SteveF

Re: Large Object Loading with Teradata Parallel Transporter

Thanks for the prompt reply!

Re: Large Object Loading with Teradata Parallel Transporter

Hi,

I am using the following TPT script to load a binary file at Teradata.

DEFINE JOB LOADING_LOB_DATA
DESCRIPTION 'LOAD PRODUCT DEFINITION TABLE'
(

DEFINE SCHEMA SOURCE_SCHEMA
DESCRIPTION 'PRODUCT INFORMATION SCHEMA'
(
Coluna_blob blob(100000) as deferred by name
);

DEFINE OPERATOR DDL_OPERATOR
DESCRIPTION 'DDL Operator'
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName = 'ddllog',
VARCHAR TdpId = 'atlantis',
VARCHAR UserName = 'dbc',
VARCHAR UserPassword = 'teradata2011',
integer buffersize = 64
);

DEFINE OPERATOR FILE_READER
DESCRIPTION 'DataConnector READER Operator'
TYPE DATACONNECTOR PRODUCER
SCHEMA SOURCE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'dtac_r_log',
VARCHAR DirectoryPath = '/data/ANP/files',
VARCHAR FileName = '0253B2471_parte2.new',
VARCHAR OpenMode = 'Read',
VARCHAR Format = 'formatted'
);

DEFINE OPERATOR SQL_INSERTER
DESCRIPTION 'TERADATA INSERTER UTILITY'
TYPE INSERTER
INPUT SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName = 'ins_log',
VARCHAR Tdpid = 'atlantis',
VARCHAR UserName = 'dbc',
VARCHAR UserPassword = 'teradata2011'
);

STEP CREATE_SOURCE_TABLE
(
APPLY
('create table ANP.teste_lob , no FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
coluna_id BIGINT GENERATED ALWAYS AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 999999999999999999
NO CYCLE),
Coluna_blob BLOB(1000000))
PRIMARY INDEX ( coluna_id );')

TO OPERATOR ( DDL_OPERATOR () );
);

STEP LOADING_DATA_TO_SOURCE_TABLE
(
APPLY

('INSERT INTO ANP.teste_lob values (:Coluna_blob);')

TO OPERATOR (SQL_INSERTER)

SELECT * FROM OPERATOR (FILE_READER);
);

);

When I execute the script I get the following TPT Error:

Teradata Parallel Transporter Version 13.10.00.04
Job log: /opt/teradata/client/13.10/tbuild/logs/root-40.out
Job id is root-40, running on atlantis
Found CheckPoint file: /opt/teradata/client/13.10/tbuild/checkpoint/rootLVCP
This is a restart job; it restarts at step LOADING_DATA_TO_SOURCE_TABLE.
Teradata Parallel Transporter DataConnector Version 13.10.00.04
FILE_READER Instance 1 directing private log report to 'dtac_r_log-1'.
FILE_READER Instance 1 restarting.
FILE_READER: TPT19008 DataConnector Producer operator Instances: 1
FILE_READER: TPT19003 ECI operator ID: FILE_READER-12595
Teradata Parallel Transporter SQL Inserter Operator Version 13.10.00.04
SQL_INSERTER: private log specified: ins_log
SQL_INSERTER: connecting sessions
FILE_READER: TPT19222 Operator instance 1 processing file '/data/ANP/files/0253B2471_parte2.sgy'.
FILE_READER: TPT19350 I/O error on file '/data/ANP/files/0253B2471_parte2.sgy'.
SQL_INSERTER: Total Rows Sent To RDBMS: 0
SQL_INSERTER: Total Rows Applied: 0
SQL_INSERTER: disconnecting sessions

FILE_READER: TPT19112 Variable (VAR) field length error. Expected maximum length 1024, received length 16625 in record 1.

FILE_READER: TPT19221 Total files processed: 0.
SQL_INSERTER: Total processor time used = '0.05 Second(s)'
SQL_INSERTER: Start : Fri Nov 11 14:59:33 2011
SQL_INSERTER: End : Fri Nov 11 14:59:35 2011
Job step LOADING_DATA_TO_SOURCE_TABLE terminated (status 12)
Job root terminated (status 12)

How can I do to change the maximum length parameter? Is there another change that I have to do?

Thanks in advance.

Best Regards,

Cezar

Teradata Employee

Re: Large Object Loading with Teradata Parallel Transporter

Please provide me with the contents of the file "0253B2471_parte2.new".

That file must be in the proper format.
-- SteveF

Re: Large Object Loading with Teradata Parallel Transporter

I have changed file format from "formatted" to "delimited" and the TPT could load file inside Teradata.

Thanks.
Cezar
Enthusiast

Re: Large Object Loading with Teradata Parallel Transporter

I am looking at the first example since it most closely resembles what I need to do. I am getting a file from Oracle that contains geospatial data that exceeds the varchar(64000) and the 64k row length issue. However, it comes in a single file. From what I am reading, it sounds like I would need to create a data file with the non-lob data and a file for every lob (geospatial data in this case) for each row in the data file, correct? If that is the case, how would modify example one? And when are the TPT operators going to be geospatial aware?