Teradata Parallel Transporter #1 – Basics

Tools
Tools covers the tools and utilities you use to work with Teradata and its supporting ecosystem. You'll find information on everything from the Teradata Eclipse plug-in to load/extract tools.
Teradata Employee

Teradata Parallel Transporter #1 – Basics

This article will provide you with accurate information regarding the Teradata Parallel Transporter product. Hopefully this will educate and clear up any misunderstandings regarding the basic information about the product.

What is it?

Teradata Parallel Transporter is the preferred load/unload tool for the Teradata Database.

What about the legacy load tools?

Parallel Transporter is able to run all the bulk and continuous Teradata load/unload protocols in one product. In the past, a user had to run the protocols with separate tools with different script languages. The stand-alone load tools, FastLoad, MultiLoad, TPump, and FastExport are functionally stabilized and no new features are being added other than to keep them operational in supporting new Teradata Database releases. All new features requested by customers are being added to Parallel Transporter.

Currently, the stand-alone load tools are being supported indefinitely and no discontinuation notice has yet been issued. It is recommended that all new Teradata load applications be implemented with Parallel Transporter.

How do I get it?

You most likely already have it. Parallel Transporter is in the Teradata Tools & Utilities software bundles included with the Teradata Database. In addition, all customers on Teradata subscription that have the legacy, stand-alone load tools are entitled to equivalent licenses for the Parallel Transporter Operators. Contact your Teradata account manager for details.

How do I run it?

Parallel Transporter can be invoked through 4 interfaces:

  • Application Program Interface (API) – used by leading ETL vendors for tight, parallel, high-performance integration
  • Script – used when a customer doesn’t have an ETL tool
  • Command line (sometimes referred to as the Easy Loader interface) – used to load data from a flat file with a single command line 
  • Wizard – used to generate simple scripts. Use this tool as a way to learn the script language and not as a production load interface.

 Listed are the four main Parallel Transporter Operators:

  • Load Operator – bulk loading of empty tables (FastLoad protocol)
  • Update Operator -  bulk load/update/upsert/delete of tables (MultiLoad protocol)
  • Stream Operator continuous loading of tables (TPump protocol)
  • Export Operator bulk unloading of tables (FastExport protocol)

What is the architecture?

If you run TPT with the script interface, a TPT infrastructure component interprets the script and invokes the proper Operators to read and load the data.

If you use and ETL tool, the ETL tool will read and transform the data and pass the data in memory to the TPT API interface which will invoke the proper Operator to load the data.

What advantages are there over the legacy tools?

That’s easy to answer. The three main benefits are performance, ease of use, and better ETL tool integration.

- Performance: As you already know from your use of the Teradata Database, the best performance is scalable performance. The architecture of Parallel Transporter allows the processes running on the client load server to be scaled and parallel data streams can be created to circumvent performance bottlenecks.

For example, if I/O is a bottleneck when reading a very large input data file, then one can scale Parallel Transporter to create multiple data flows with multiple readers of the same file or multiple files to create more data throughput for the load.

 - Ease of use:There are many features when using the script interface that makes writing a load job much easier.

Example 1: One script can extract data from a production Teradata Database and load into a test database. The data will flow in memory among the parallel processes on the client load server. With the stand-alone tools one would have to write two scripts in two different languages and put a problematic named pipe in between the two tools to pass data.

 

Example 2: One script can load a Teradata Database and the user can determine which load protocol to use at run time without having specified the load protocols in the script. This allows one to easily switch between load protocols at run time using just one script.

 - ETL tool integration:The leading ETL vendors now have more control over the entire load process when they integrate with TPT API. The vendors are urging their customers to use this interface. Contact an ETL vendor for more details.

What do I have to learn to use Parallel Transporter?

If you are using an ETL tool, then you don’t have much to learn since the ETL tool automatically works with TPT. Once the user has entered the ETL data flow into the ETL tool’s GUI, the ETL tool will automatically generate the appropriate calls to Parallel Transporter’s API interface (TPT API). The input data is passed in data buffers that reside in memory from the ETL tool to TPT API without having to land the data or deal with problematic named pipes.  

If you are writing your own scripts, most everything about the stand-alone load tools still applies such as the same basic options and parameters, limitations (e.g., number of concurrent load jobs), when to use the protocol, etc. Mostly, you have to learn the new script language and the tlogview tool. The tlogview tool allows the user to view and make sense of the output generated from many parallel processes that are executed in the Parallel Transporter job.

Why haven’t I been using it?

That’s a good question.  Many Teradata customers that use Parallel Transporter have asked why more customers haven’t leveraged the advantages of the tool. Every Teradata Partners Conference since 2006 has had a customer present their success story including the Partners 2010 conference. Contact your Teradata Account Manager on how to attend the Teradata Partners Conference.

Summary

Parallel Transporter is the preferred load tool whether using it with an ETL vendor or writing your own scripts.

The stand-alone load tools are frozen and no new major features are being added.

Check with your Teradata account manager since you most likely already have the proper licenses for Parallel Transporter.

Teradata Education Network, www.teradata.com/t/TEN, has a Parallel Transporter technical tips and techniques presentation and a web-based training class. A white paper, Active Data Warehousing with Teradata Parallel Transporter, is available from www.teradata.com.

If you are interested in increased performance, improved ease of use, a better interface to your ETL tool, or new load tool features, then install Parallel Transporter and get started on writing new parallel load applications for the Teradata Database.

11 REPLIES
Teradata Employee

Re: Teradata Parallel Transporter #1 – Basics

Question: Is it possible to create a TPT 'job' that would be able to extract data with one Producer (Export Operator) and load that single data set to two different Consumers (Load Operators) on two different TD Instances. e.g. extract data from a single source but load that data into two separate tables on two separate systems?

From what I've reviewed about TPT, I would say yes, but would like confirmation and any references on how to, etc. if there are any.

Thanks in advance - JK, Swift Trans.
Teradata Employee

Re: Teradata Parallel Transporter #1 – Basics

Yes, you can create a TPT job that has one (or many) producer Operators (e.g., Export, Data Connector, etc.) that can pass data to multiple consumer Operators. Those consumer Operators can send data to multiple different locations (e.g., write a flat file for audit, or different Teradata systems, etc.).

Here is an example of a snippet taken from a script that has a flat file producer (Data Connector) and two consumer (Load Operators):

APPLY ‘INSERT INTO table1 ( :col1, :col2, …)’ TO OPERATOR (LOAD_OPERATOR () [4] ATTR(....) ),
APPLY 'INSERT INTO table2 ( :col1, :col2, …)’ TO OPERATOR (LOAD_OPERATOR () [2] ATTR(…..) )
SELECT * FROM OPERATOR ( DCON_OPERATOR () [3])

There are limitiations that you have to accept. You have a single point of failure. If anything happens to the job (e.g., error condition), the entire job will end. One system will not load on a data stream while the other one has a problem. Also, if one load on the data stream could go very fast and the other very slow, the load streams are forced to pace at the slowest rate since each data buffer must be processed by both APPLYs before the next buffer can be processed.

Because of the limitations, this is not one of the recommended approaches for a Teradata Multiple Systems architecture. See these links for more information on multiple systems architecture:

http://www.teradata.com/t/article.aspx?id=1540
http://www.teradata.com/tdmo/v08n03/FactsAndFun/Services/Power.aspx

Teradata Employee

Re: Teradata Parallel Transporter #1 – Basics

Markhay - thanks for getting back. Yes, we've figured this out with our tests pulling data form single source (AS400) to our Prod and Dev instances of TD. Works like a charm and are aware of your 'issues'. At partners, we tried to find anyone that has done or are doing this with no luck - but doesn't mean that some one is. We currently do not use TPT but our investigations/tests have demonstrated that this will be part of our 'dynaminc data pull strategy' for our upcoming rollout of Dual active systems in 2011. Thanks again for confirming this.

JK
Teradata Employee

Re: Teradata Parallel Transporter #1 – Basics

Keep me informed of your progress. This seems like a good presentation for the 2011 Teradata Partners Conference.
Teradata Employee

Re: Teradata Parallel Transporter #1 – Basics

Thanks Mark. Will keep you posted and if we have issues, can we get with you? We'll start working on our 'roll out' of the strategy in Jan. The current situation is good for this approach as we'll be loading two different copies of STAGE data for Production (eventually) on two different systems and if one fails, we want both to fail - need to keep them in 'sync', etc.! Also, we plan on submitting an abstratct for 2011 Partners and this will be part of it.

Again - thanks for your time and assistance - much appreciated.

JK - Swift Trans
Teradata Employee

Re: Teradata Parallel Transporter #1 – Basics

Yes, feel free to contact me if you have issues. Filter the communication through your Teradata account team. This looks like a pretty cool application of Parallel Transporter.

Re: Teradata Parallel Transporter #1 – Basics

Not sure if anyone will see this posting after such a long time. I am new to TPT scripting (although have used the 'old-school' tools for many years). You made a comment about using mutliple readers for a very large file:

You said - "For example, if I/O is a bottleneck when reading a very large input data file, then one can scale Parallel Transporter to create multiple data flows with multiple readers of the same file or multiple files to create more data throughput for the load."

Do you have any examples of this? I took a class a while back, but this type of file read was not discussed, and I can find no examples in the manual.

Thanks!

Brad.

Re: Teradata Parallel Transporter #1 – Basics

What does [n] signify in TPT job in APPLY section?
The OUTLIMIT is apparently ignored by TPT job.
What is the threshold of Data Volume = num of records * reclength for using TPT over FEXP?

Re: Teradata Parallel Transporter #1 – Basics

RETLIMIT in BTEQ and OUTLIMIT in FEXP work fine but ignored by TPT job and gets all records.
Anyone know this issue with Teradata 13.0?