Why TPT mload is taking long time than bteq for huge volume of data

Tools & Utilities
Enthusiast

Re: Why TPT mload is taking long time than bteq for huge volume of data

Hi Steve, There are two queries one with only delete and other with both delete and insert. I can add Delete task='Y' for the query with delete statement. But can i add deletetask for TPT script with both Insert and delete?

But here my question is why normal Bteq is taking considerable less time than TPT. Is that because of extra step TPTEXP which exports data before deleting. 

Please help me in understanding this.

Pls find script with both delete and insert statements.



Define Operator Update_Operator

Description 'TPT  Update Operator'

Type Update

Schema Schema

Attributes

(integer maxsessions = 12

,integer minsessions = 2

,integer maxdecimaldigits = 38

,varchar targettable = 'TABLEA'

,varchar tdpid = '{tdp}'

,varchar username = 'username'

,varchar userpassword = 'password'

,varchar errortable1 = 'ETTABLE'

,varchar errortable2 = 'UVTABLE'

,varchar logtable = 'LTTABLE'

,varchar worktable = 'WTTABLE'

,varchar workingdatabase = 'LOGDATABASE'

);

 

Define Operator Export_Operator

Description 'TPT  Export Operator'

Type Export

Schema Schema

Attributes

(integer maxSessions = 12

,integer minSessions = 2

,integer maxdecimaldigits = 38

,varchar tdpid = '{tdp}'

,varchar username = 'username'

,varchar userpassword = 'password' 

,varchar workingdatabase = 'LOGDATABASE'

,varchar SelectStmt = 'Select  *  From  TABLEB Where condtions

                      ;'

);

 

 

Step Apply_Delta

(Apply

   Case

     When (   Delta_Action_Code = 'D'

          )

       Then ('Delete from TABLEA

 where Key=:Key

 and From_Date = :From_Date

 

              ;'

            )

when (   Delta_Action_Code = 'A'

          )

       Then ('Insert into TABLEA

 values (Key=:Key,From_Date = :From_Date and rest...);')

   End

 To Operator (Update_Operator [1])

 Select * From Operator (Export_Operator [1]);

);

 

);

Teradata Employee

Re: Why TPT mload is taking long time than bteq for huge volume of data

You haven't described the BTEQ statement that you are comparing to or provided the explain plan for it but I assume since you have the list of rows to delete in tableb that you are doing a join delete.

The join delete will run completely parallel inside Teradata. If both tables have the same primary index/PPI then a spool file will not even be required.

The TPT job will have to spool tableA (80M), prepare it into an export spool (80M), export 80M rows to a client incurring serialization in the network and client, load 80M back into Teradata into an Update operator work table (80M) also experiencing serialization in the client and network, sort the work table (80M row sort), then use the list to delete from TableB.

If you have the list of rows to delete already prepared in Teradata, it is much more efficient to use a direct set based SQL operation than to unload and load/apply.

Teradata Employee

Re: Why TPT mload is taking long time than bteq for huge volume of data

Thanks Todd!

For a pure delete, I was going to steer this user into invoking the MultiLoad-protocol Delete Task (an application phase only operation) which is must faster than the approach currently taken, but still not sure how that would compare to a simple BTEQ Delete.

Does Journaling also play a factor?

-- SteveF
Enthusiast

Re: Why TPT mload is taking long time than bteq for huge volume of data

Thanks Steve and Todd, 

My Bteq query looks like below. 

Delete from TableA A

,TableB B

Where

A.Key=B.Key

And A.From_Date = B.From_Date

And  B.Action_Code = 'D'


;

.If Errorcode <> 0 Then .Quit 16

Insert into TableA

ColumnList

Select

Columnlist

 From TableB  B

 Where

      B.Action_Code = 'A'

;

I have verified the teradata view point for my TPT run, 12 sessoins are invoking in that 6 sessons are idle , one session is in responding state which does TPTEXP, other 5 does the TPTUPD which are active. The session with TPTEXP uses partition DBC/SQL. Is this correct? I expected that TPTEXP should use FEXP rather than the DBC/SQl Where TPTUPD uses parition as Mload.

 

Is my teradata configuration is wrong for the TPTEXP? Please comment on it .

THanks in advance.

Teradata Employee

Re: Why TPT mload is taking long time than bteq for huge volume of data

Based on your job scripts, the Export operator will connect 12 data sessions (and 1 SQL session for executing the SELECT statement).

The Update operator will also connect 12 data sessions (and 2 additional SQL sessions for submitting the DML statements and for managing the restart log table).

You should check your output log to see if TASM changed the number of sessions that were actually connected.

Is it possible that you saw idle sessions for the Export operator while the spool was being built, and prior to the data sessions actually extracting the data from the DBS?

(Also, you can remove the "maxdecimaldigits=38" from the Update operator definition; it serves no purpose.)

-- SteveF
Junior Contributor

Re: Why TPT mload is taking long time than bteq for huge volume of data

Why do you want to use TPT UPDATE if you know it's faster with SQL DELETE/INSERT?

There's only one advantage of TPT UPDATE:

It needs only a single scan of the table vs. two scans for two SQLs.

The PIs of source and target must match for TPT UPDATE to run successfully, so this is also best case for SQL.

If you switch to a MERGE instead of the INSERT, it will apply direct inserts without spool (similar to the SQL DELETE Todd explained):

merge into TableA a using TableB b
on A.Key=B.Key -- needed for MERGE
and B.Action_Code = 'A'
and 1=0 -- dummy conditon for unconditional inserts
when not matched then
insert (key, ...)
values (b.key, ...)
;

Now put both DELETE/MERGE in a TPT DDL operator and you're still using TPT, just faster :-)

Enthusiast

Re: Why TPT mload is taking long time than bteq for huge volume of data

Thanks Dieter, Steve . It helped me a lot.