TPT Vs other ETL Utilities


TPT Vs other ETL Utilities

What is the key factor that makes TPT better than other Load utilities.Given that it uses SQL like language syntax but invokes various Load utility protocols.How is it different and similar from other load utilities? Does it really, Impact performance and how?


Re: TPT Vs other ETL Utilities

For TPT 12:

some pros:
1. is a new flag ship of Teradata ETL Tools
2. should get all new functionality of Teradata
3. can read files in multiple streams (higher speed)
4. uses an internal pipe mechanism (load does not wait on extraction)
5. can easy switch Teradata loading tool
6. running the main component as a local service
7. multiple ETL steps in one job

some cons:
1. compiled proprietary language with no code/error interaction = very hard debugging
2. local unmanaged checkpoint directory with thousands of files
3. needs a static column schema
4. text field sizes in schema need to multiplied by 2 or 3
5. low 3rd party ETL tools support
6. very complex functionality delays a full functional new version release by several months
7. not all Teradata SQL statements are supported

An ideal TPT project is a high volume load using scripts based on large ASCII text files in US.

The worst TPT project is based on a 3rd party ETL tool using a local character set or UNICODE.

TPT 13 has some changes in design, but I have not seen it yet.

Re: TPT Vs other ETL Utilities

Thank you, Your posting was quite informative.

Here is another question for you. Talking about Data streams, what exactly are they? Is it the system Memory or the portion of Disk drive? How effective would it be performance wise,if we run TPT on a client machine with low Memory?

Re: TPT Vs other ETL Utilities

Even am wondering what a data stream is meant by (am new to TPT)..It should be sme kind of memory only right?Could someone pls throw some light on it..
Teradata Employee

Re: TPT Vs other ETL Utilities

TPT "data streams" use internal shared memory.
If your system has low memory, then TPT might have difficulties.

Some corrections to the above pro/con list:

TPT is not an ETL tool. It is a loading tool. Because our script language is "SQL-like" we do have the ability to perform some "minor" tranformations and filtering. TPT is not to be thought of as an ETL tool though.

We do not use pipes. We use shared memory.

I would not classify our checkpoint/restart capabilities as "local unmanaged checkpoint directory with thousands of files". When a job terminates successfully, we delete the checkpoint files. Therefore, there should only be a checkpoint file in the checkpoint directory for each job that is currently running, or for jobs that may have terminated abnormally. And if the user would like to delete these files, we provide commands to do so.

In 13.10 and 14.0 we are making great strides in improving the capabilities of determining the schema dynamically. We have also added enhancements to make the script language a lot less verbose.

When it comes to "3rd party ETL tool support", we actually have a lot of support. We have an API-interface to TPT that is widely used by the likes of Ab Initio, Datastage, Informatica, etc. It is called TPTAPI and is widely used.

For "con #6" I would say that we attempt to deliver patches (efixes) with new functionality as soon as we can get the features into the product. We do not always wait for new releases before providing new features.

Lastly, for "con #7" I would like to know to which SQL statements you are referring when you indicate we do not support all, and why you would want or need all of the SQL statements to be supported. For a loading tool, I am sure there are some SQL statements that just do not need to be supported.
-- SteveF

Re: TPT Vs other ETL Utilities

Hello Feinholz,

About DATA STREAM and internal memory considerations.

We load the TERADATA by batch processing overnight.

The system is configured with 20 simultaneous loads. We do it through UTILITY LIMITS in Workload Designer.

I understood that TPT DATASTREAM is based on the client RAM memory.

How do I size the RAM on my client (Linux) to respond to 20 simultaneous Teradata's loads ?

I have no idea about the RAM size needed for my activities.

Thanks for your help


Teradata Employee

Re: TPT Vs other ETL Utilities

Each load job is independent of each other and so it depends on your system memory availability and your virtual memory settings.

Each job, by default, will allocate 10MB of shared memory for the data streams.

-- SteveF

Re: TPT Vs other ETL Utilities

vincent91, Please check "-h" option in the TPT.