Session and Checkpoint consideration in TTUs and TPT


Session and Checkpoint consideration in TTUs and TPT

I am trying to explore the usage of SESSION and CHECKPOINT in Teradata FastExport/FastLoad/Mload and TPT-Teradata Parallel Tranporter.
As SESSION parameter if we dont specity in TTUs, it will take default as 1 for each AMP for 100 AMP system it sets as 100 sessions default. which can be expensive for small table.

But my question is what are factors/criteria we need to consider when you use SESSION and CHECKPOINT parameter in TTUs and TPT? Which impacts perforamce of export and load.

If i want to load table with 1Million record and 100Million record what is the best practices/consideration i should use to define #SESSION and #CHECKPOINTs.

Appreciate the valuable input. Especially from DNoeth and Feinholz.
:-) Danke/Thanks

Re: Session and Checkpoint consideration in TTUs and TPT

DNoeth and Feinholz... Please respond..
Teradata Employee

Re: Session and Checkpoint consideration in TTUs and TPT


The basic rule of thumb for checkpointing is this: how much time are you willing to lose in a batch job if the DBS should encounter a restart?

If you have a job that should run for, say, one hour, is it acceptable to checkpoint every 15 minutes? If the DBS should encounter a restart, that is the amount of data that would need to be resent to Teradata when the job resumes after the restart.

Depending on the row size and the utility, 1 million rows can usually be loaded in under a minute. Therefore, it is probably not necessary to enable checkpointing if loading 1 million rows.

And again, depending on row size and utility, 100 million rows may take 10-20 minutes (could be more, could be less). Even checkpointing every 1 million rows might be too often.

A lot of the decisions for checkpointing will need you to map the time you are willing to lose (for resubmission of the rows since the last checkpoint) with the number of rows per minute that can be loaded so that you can determine the checkpoint value.

FastLoad's checkpointing is row-based. MultiLoad supports both row-based checkpoints and time-based checkpoints.

TPT checkpointing is purely time-based (expressed in terms of seconds).

Session Usage:

The number of sessions to use is dependent upon several factors. Network speed between your load server and Teradata, the amount of data to be loaded are a few. I would think that you can get pretty good performance with about 30 sessions. You could try more but at a certain point, some sessions will just not be used that much if the utility can process the rows fast enough and the network speed is not a bottleneck.

Most of our utilities support some type of config file where you can set the "max sessions" so that the jobs do not try to connect 1 per AMP. These config files should be documented in the respective reference manuals.

-- SteveF

Re: Session and Checkpoint consideration in TTUs and TPT

Thanks Feinholz thats really good informative note. And thanks to ratchetandclank for redirection as well.

In one of the forum i have read TPT can be use to load set of files in a directory. As we have limit of 15 files for FEXP and FLoad as these TTUs works well with situation of 15 load at a time.

How can we achieve loading more than 15 files using TPT?
Is there any bottleneck/ or performance issue if we load a directory contains 30 to 40 files using TPT?
If TPT is time-based checkpoint what is the rule of thumb for checkpoint we can set for 100M records on appox 400 AMPs system?
What is the relation between SESSIONS and INSTANCE in TPT? what is the thumb rule we can use for seeting these Attributes in above situation?

I am currently evaluating TPT please provide all your expert advice and thoughts.

Thanks in advance to everyone reading and writing this forums..!!

Re: Session and Checkpoint consideration in TTUs and TPT

Yes, you can use more than 15 files using TPT. You have to use the wildcard in the FileName attribute of DC operator. Like "VARCHAR FileName = '*.dat' ", and the DirectoryPath attribute will point to the directory where the files are stored.

Sessions is the number of sessions connected from the client to the teradata database. INSTANCE is the number of instances of the operator. For example, if you are running against a database having 100 AMPS, and specify 4 instances of LOAD/UPDATE operators, it will spawn 4 LOAD/UPDATE operators, and every INSTANCE of the operator will connect to 25 sessions (Since it is an 100 AMP system).

Re: Session and Checkpoint consideration in TTUs and TPT

Ratchetandclank , it means if i have single Instacne of Export on 100 AMP it will utlize 100 Sessions by default. As TPT claims this is fastest in terms of throughput versus generic utilities.

When i ran a Generic FastExport for 1M records on 100AMP system with 4 session it took 33Sec and for 12session it took roughly a minute. Because it does not required those many session for 1Million records, so with less sessions it works well.

But when i was running the TPT-EXPORT for same of amount of data/amps with default ALL SESSIONs it was executing faster than (MaxSession 4 to10) session takes more execution time. I dont know weather i am missing any other attribute to be initialized(set) in TPT scripts.

Currently my requirement is throughput test in terms of performance and minimum CPU/IO utilization.

1M is small is small amount of data i am currently testing but we will be testing more than plenty of 100M records. when i was going through TPT-forum and white papers i am impressed , but i am not getting required result right now.. Is there any thing i am missing.!!

Pls help..!!

Re: Session and Checkpoint consideration in TTUs and TPT

I did not understand this statement:

it was executing faster than (MaxSession 4 to10) session takes more execution time

Did you find a better performance, time wise ?

The CPU Utilization will be more in case of TPT when compared to the legacy fastexport because there are a lot of other things that TPT needs to do when compared to legacy fastexport.

There are no attributes in export operator which directs TPT to increase performance as such.. Increasing the number of instances would help because there will be two processes sending the records instead of a single process. However, it will have an effect on the CPU utilization. We need to find a balance.. That would depend on the power of the client system we are using.


Re: Session and Checkpoint consideration in TTUs and TPT

I Apologize for my statment..

Well i was not getting the better peformance using TPT-Export for following ..!!

Traditional TTU-FEXP : 1M records / 100 AMPs system / Sessions(4 to 10) = Excecution time 30sec.

TPT-EXPORT : 1M records / 100 AMPs system / 4-10 Sessions = Excecutiong time is More than 3Min. But if specify more session upto 40-50Session execution time is better.

In Traditional TTU if i reduce the # of sessions it gives better performance,
Whereas in TPT if reduce # of session gives worst performance.
I am not getting rule of thumb for defining session in TPT..

Hope my statement is explainatory..!!

As you said if we increase number of INSTANCE will give better performance can share some sample code..!

Re: Session and Checkpoint consideration in TTUs and TPT

No need to apologize..

The legacy Fastexport writes to a file. With TPT, are you using dataconnector operator as consumer ? I still am not able to believe that export operator is taking more time than fastexport to load the records..

Might require Feinholz's assistance..

Re: Session and Checkpoint consideration in TTUs and TPT

I think there is "NO Rule of Thumb" for defining Sessions or Instance in TPT. I came to a conclusion after running set of TPT scripts and i wud like to share my experience with you ALL.

1st Run:
1)Run the EXPORT with ALL available Sessions(1 per AMP) , with 1 Producer and 1 consumer Instance.
2) See the Job execution time.

2nd Run:
1)Run the EXPORT with ALL available Max Sessions(No change in MAXSessions) , with 2 Producer and 1 consumer Instance.
2) See each Producer are balancely exported #rows and compare each Producer Instance CPU time. If it is "Balanced accross both the Instance" then Increase number of Producer Instance.
Note: Try to maintain Number of Instace in mulitple of Available AMPs.
3) Similarly try to increase Producer Instance in multiple so that your job execution will give expected exection time.

3rd Run:
1) once you identified number of Producer instance required for your Export JOB, try to reduce number of MaxSessions because all sessions are not required for any TPT job. So Try reducting number of sessions in multiple of number of available amps and see if your job run faster.

Similar approach for balancing of Instance and Sessions you need to do when you run TPT-LOAD/UPDATE.

with above consideration i got better result in TPT-Export versus FastExport.

Traditional utitlies(FEXP/FLOAD) we have rule of thumb for Sessions and checkpoint based on size of data.

In TPT its bit difficult to get rule of thumb for identifying number of sessions/Instance

Now the question is which tool is better for extract and load one can decide?

Thanks for all viewers and experts who showed there interest in this BLOG.