I want to understand the difference in the usage of a TPT script vs BTEQ script.
Say I am writing a complex SQL with joins on 15 tables, (each table approximately having 1 million rows) with few ordered analytical functions. I want to load the data produced by this SQL to another table in the same database/ same server, which of the following is better considering the performance and also best practice.
1. Create a BTEQ script, add this complex SQL with a INSERT INTO TABLE and run that bteq from command line using BTEQ command
2. Create a TPT script, use a stream or export operator to retreive the data and then use an update operator and load it to a table.
if your wanting to move data from a set of tables in the database using an insert/select with some complex logic. You could do this with either TPT or BTEQ.
However, you would not use TPT Stream/Export/Load operators as those are generally about reading/writing to files (although you can use a named pipe to prevent data from landing). To do the equivalent of BTEQ you would most likely use the TPT SQL Selector Operator and the TPT SQL Inserter Operator using a named pipe so you don't have to land the data.
Just because you could do this in TPT, I'm not sure I would really recommend it. TPT isn't as flexibile as BTEQ when it comes to doing what your wanting to do, for example with bteq, you could inline it into a shell script to be able to do variables and such in the SQL itself, that isn't really doable in TPT, you would have to write the TPT script and while there are some variable substitutions able to be done, it can't be done in the SQL itself only in the variables for TPT itself.
I would however recommend looking at the python module Teradata has released or the udasql (java code) that was released over a year ago, those would actually be better than BTEQ in many ways and gives you more flexibility than you have today in bteq.
Ok, let me clarify some of the information provided, some of which is not accurate.
One of the things you need to consider is time.
How long (actually, how fast) do you want this to take?
You indicated that you may want to have a complex SQL query that will be pulling data out of 15 tables, each table containing about 1 million rows.
If you are implying that the total number of rows to be copied to the target table is about 15 million, and the length of "time" this job can take is important, then BTEQ may not really be the preferred option.
TPT has the ability to choose from a number of different protocols with which to load the data into the target table. The FastLoad protocol is the fastest (although there are restrictions that are well documented). The TPT Load operator executes the FastLoad protocol, and the notion that the Load operator is generally used for "reading/writing files" is absolutely not true.
The Load operator is only responsible for sending data to Teradata. It uses multiple data sessions that connect directly to the AMPs. The Export operator is the high speed opposite of the Load operator in that it executes the FastExport protocol to quickly and effeciently pull data out of Teradata.
In a single TPT script, you would use the Export operator and the Load operator to move data from one (or more) source table(s) and load into a single target table. Quickly and efficiently. And there is no need to use named pipes to prevent the data from loading. TPT moves the data using shared memory to pass data between operators and it is managed by TPT. The user does not have to do anything. TPT does not use named pipes.
As to the claim that TPT is not as flexible as BTEQ, for this particular scenario I would disagree. This is exactly a scenario that could benefit from TPT's parallelism and flexibility.
And no, you would not want to use the TPT Selector and TPT Inserter for this scenario. You could, but the data would move at BTEQ speeds.
As for "flexibility", the TPT script language does allow variable and parameter substitution. Anything you can do with a "script", with passing in parameters, can be done with TPT.
If you would like to discuss TPT some more, feel free to contact me. It is a shame that incorrect information about TPT is being posted in this forum.
Back to the original question...
It is much easier, quicker and more efficient to INSERT SELECT inside the database on Teradata than to unload the result and load it back. The INSERT SELECT SQL can be launched from a number of different interfaces depending on where it is desired that it will run from, interactive or batch, scheduled or ad-hoc,... BTEQ, TD-Studio, JAVA code calling JDBC, C code calling ODBC, Stored procedure or several different programming methods using the REST interface if it has been installed in the environment.
Ok, so here is a sample script to do make TPT do what you would do in BTEQ...
DEFINE JOB BTEQ_INS_SEL
APPLY ('insert into mytable select * from thistable;')
TO OPERATOR ($DDL);
This example uses a file that holds the variables
tbuild -f mytptscript.tpt -v myjobvars.txt -j LoadMyData
This will let you run the insert/select via TPT...I honestly had never tried this before today as i never used the DDL operator before (never really had a need to).
Now, this would require some additional scripting around this to make this ready for some sort of repeatable job or to build the mytptscript.tpt on the fly in the event you wanted to be able to pass in different sql to the script.
Honestly, I would rarely use TPT Export/Load like Feinholz suggested unless your guaranteed that the table your inserting into is always empty before you load it as the Load operator will only work against empty tables and/or have plenty of open utility slots for the job to consume (it would consume two one for the export and one for the load), but I would also never actually use the Inserter/Selector option I suggested either if i'm just moving data between tables in the database.
Here is an example shell script with bteq written inline that we normally use. This lets us combine all of this into a single script. There is no easy way to do this with TPT that i have found, extracting the log from TPT is also rather challenging because it isn't streamed as part of the job execution you have to run a different set of commands to get the actual execute log.
or any of the methods Todd suggested would work depending on if this is a one time execution or needs to be a repeatable production process.
To me the fexibility of being able to inline bteq in a shell script allows me greater control and the ability to write fewer jobs to accomplish my tasks. You really don't have that flexibility with TPT despite the fact that you can have some limited variable substition in the script itself, that substition is limited to what you hard code into the jobvars file and not really as dynamic as you could potentially do from a shell script and inline scripting of the utility.
I happen to agree with Todd that sometimes a simple INS-SEL is all that is needed to copy data from one or more source tables to a target table.
And a tool like BTEQ is probably used more often than TPT to perform an INS-SEL than a tool like TPT.
However, if you wanted to write a general purpose TPT script that could then be parameterized, the script would look like this:
DEFINE JOB BTEQ_INS_SEL
TO OPERATOR ($DDL);
And then either use a job variable file for the logon credentials and INS-SEL:
,my_sql='insert into mytable select * from thistable;'
Or, the information can be passed on the command line:
$ tbuild -f mytptscript.tpt -j LoadMyData -u "SourceTdpId='yourtdpid',SourcePassword='yourpassword',...,my_sql='insert into mytable select * from this table' "
If both are in same system then using MERGE-INTO could also be an option which will allow to have join index and do operation at block level.
In TPT that can be implemented using DDL operator.