TPT Instances - How this works?

Tools

TPT Instances - How this works?

Hi Teradata gurus,

I have developed a process in Java that read some files, add some fields, and load in some pipes to my TPT script.

In my tests I'm loading 6 pipes with 6 separeted Threads.

To load this I have these Producer Operators:

DEFINE OPERATOR PIPE_READER1()

DESCRIPTION 'Define opcoes de leitura de arquivos'

TYPE DATACONNECTOR PRODUCER

SCHEMA T811_R2_EXCLUIDOS_DETRAF_2_SCHEMA

ATTRIBUTES

(

        VARCHAR AccessModuleName = 'np_axsmod.dll'

      , VARCHAR AccessModuleInitStr

      , VARCHAR FileName              = '\.\pipe\EXCLUIDOS_DETRAF1'

      , VARCHAR Format      = 'DELIMITED'

      , VARCHAR TextDelimiter     = ';'

      , VARCHAR IndicatorMode         = 'N'

      , VARCHAR OpenMode              = 'Read'

);

...         /*PIPE READER 2,3,4,5*/

DEFINE OPERATOR PIPE_READER6()

DESCRIPTION 'Define opcoes de leitura de arquivos'

TYPE DATACONNECTOR PRODUCER

SCHEMA T811_R2_EXCLUIDOS_DETRAF_2_SCHEMA

ATTRIBUTES

(

        VARCHAR AccessModuleName = 'np_axsmod.dll'

      , VARCHAR AccessModuleInitStr

      , VARCHAR FileName              = '\.\pipe\EXCLUIDOS_DETRAF6'

      , VARCHAR Format      = 'DELIMITED'

      , VARCHAR TextDelimiter     = ';'

      , VARCHAR IndicatorMode         = 'N'

      , VARCHAR OpenMode              = 'Read'

);

and a APPLY like this:

INSERT INTO MYTABLE ...

VALUES ...

TO OPERATOR (DATA_LOAD () [3]) /*DATA_LOAD is an Update Operator*/

  SELECT *

  FROM OPERATOR

  (

   PIPE_READER1()[1]

  )

UNION ALL

...        /*PIPE READER 2,3,4,5*/

  UNION ALL

  SELECT *

  FROM OPERATOR

  (

   PIPE_READER6()[1]

  );

Here we have:

Insert into DATA_LOAD()[using 3 instances] the union of my 6 PIPE_READER [using 1 instances each one]

Works well:

              Rows Inserted: 42782588

              Rows Updated:  0

              Rows Deleted:  0

But I see this on the log:

                        Instance    Rows Sent  

                        ========  =============

                            1        49127971

                            2             290

                            3               0

                        ========  =============

                          Total      49128261

Why the hell it load 49127971 using the first instance, 290 registers using the second and don't use the third instance?

(I had tried using 6 instances, and the others instances are not used too)

UNION ALL is not the solution for the "paralelism"?

I had built this script how the examples on the Teradata Parallel Transporter User Guide.

Another questions:

- How I can measure how much pipes is better to use to balance the load between producer and consumer?

- JMS may be faster than pipes?

6 REPLIES
Teradata Employee

Re: TPT Instances - How this works?

The UNION ALL is to get the parallelism from the producer operator side.

When sending the rows to the consumer operator (the Update operator in your case), the data is not sent in a round robin fashion to the instances. That would hurt performance due to the context switching.

Instead, we send the data to the first instance. And if that first instance can keep up with the rate at which the data is going through the data streams, it will get all of the work.

When the first instance cannot keep up, we will begin to send data to the 2nd instance.

As you can see, the 3rd instance got no work, meaning you do not need 3 instances for that job.

In fact, you really do not need the 2nd instance either because it did so little. If you take away the 3rd instance, more sessions will be distributed to the other 2 instances and you will probably notice that the 2nd instance will get no rows.

In other words, the bottleneck is still with your pipes. A single instance of the Update operator can keep up with the rate that the data from those 6 pipes is feeding data to it.

-- SteveF

Re: TPT Instances - How this works?

Great feinholz!

Now everything makes sense.

Then, I could increase the number of pipes(Java Threads), if not for the high CPU usage to open more files. (I had dozens of process like these running!)

Thanks!

Re: TPT Instances - How this works?

That was a great explanation, feinholz. You made it really clear.

Now, for a Producer Dataconnector Operator, when reading a flat file, can I guarantee that the more instances I use, the faster the file will be read?

Are the rows of the flat file evenly distributed across all instances I set?

Teradata Employee

Re: TPT Instances - How this works?

We do support the use of multiple instances reading from a single file. We have seen some nice performance improvement (due to the operating system caching the I/O reads). However, YMMV.

As far as multiple instances reading from different files, yes you can get good performance improvements by using more instances. However, you have other issues to deal with. Namely, disk head contention. If all files are on the same drive, the disk head contention and other operating system environmental issues might affect performance.

So, guarantee? No, but in a lot of cases, yes.

When using multiple instances to read a single file, yes the rows are distributed evenly across the reader instances.

When using multiple instances to read from multiple files, we load balance the files across the instances according to the file sizes.

-- SteveF

Re: TPT Instances - How this works?

Can I use more than one instances of LOAD_OPERATOR if I am using FASTLOAD as load method?

Teradata Employee

Re: TPT Instances - How this works?

You can specify multiple instances for a LOAD_OPERATOR. Sessions will be divided among the instances as evenly as possible.