We were exporting a Teradata table with huge volume of data using FastExport.Now we are planning to migrate the system using TPT.
Our output file is a pipe dilimited Text file.As per the TPT documentation if we use the export operator to offload the data it cannot be done in TEXT format.It was suggested to use the "Sql select" operator instead but it was also mentioned that "Sql select" operator is not suitable for high volume of data.We are using TPT 14.0.
You'll want to use the format of 'Delimited' in the DataConnector operator. As part of that, you can specify what the delimiter is.
You can use either the 'Export' operator or the 'SQL Selector' operator along with that. The 'Export' operator is recommended for large files since you can specify multiple sessions for the retrieval of the data.
But I was wondering whether moving to TPT from FastExport will lead to any gains from performance standpoint since we are talking about huge volume of data (TB's).The main idea behind the approach is movement from FastExport to TPT is improvement in performance?
The TPT 'Export' operator is essentially running Fast Export under the covers. So, the performance within the database should be the same as Fast Export. The 'SQL Selector' operator is similar to running a BTEQ Export (limited to one session).
As Deiter noted, because of TPT's ability to have multiple instances, you may be able to get better performance if you set it up to utilize multiple instances. I have never had a need to do this, so I can't really speak to any additional performance that you might get by doing this.
As pointed out clearly the 'Export' operator uses FastExport under the covers but may perform better because of its ability to use multiple instances.But the following aspects are not very clear:
TPT's advantage over FastExport can be demonstrated in the following scenario. Let's say that you have a large FastExport that you are executing and you determine during the execution that the bottleneck is in the disk in your host environment. So, FastExport is delivering the data to the host faster than the disk on your host environment can write to the file. One thing you could do to resolve this is to create two FastExport jobs, and have each one select approximately half the data and write to two different files contained on different volumes. By doing this, I could possibly cut my export time in half. So, you were able to get the data exported faster by running two separate FastExport jobs.
With TPT, you can do the same thing within one TPT job. There's no need to create the two TPT jobs. By specifying multiple instances, you can effectively do this in one TPT job.
In answer to the second question about the format, it works the same regardless of the format of the file.
In addition to the excellent posts above, when it comes to writing out the data in delimited format, the advantage of TPT over FastExport is that TPT supports delimited output and FastExport does not. And since FastExport cannot write out the data in delimited format, TPT's performance (just for delimited) may be immaterial. :)
TPT can also write out to multiple files in parallel. Some users may prefer multiple smaller files, rather than 1 large file.
I am having difficulty getting TPT to run. I found a sample that I used as my basic script and modified it but it keeps complaining about a single '|' is missing. Do you have a good example of a TPT script that extracts a large number of rows to a pipe delimited file?
My current example, which is not running, is using the SQL Selector Operator but I think your comment about using the EXPORT Operator is probably the correct way to go, since as you mentioned you can utilize multiple sessions. An example with this feature would be very helpful.
Here is the error I keep receiving but I cannot see where there is a single pipe issue.
tbuild -f tpt-test -j EXPORT_DELIMITED
Teradata Parallel Transporter Version 14.10.00.19
TPT_INFRA: TPT04024: Error: Line 35 of Job Script File 'tpt-test':
Single pipe character ('|') encountered where double pipe expected.
Job script preprocessing failed.
Job terminated with status 8.
Thanks for your help,
Script that I'm attempting to run.
DEFINE JOB EXPORT_DELIMITED
DESCRIPTION 'Export TD table to a delimited file'
DEFINE SCHEMA SOURCE_SCHEMA
DEFINE OPERATOR SQL_SELECTOR
VARCHAR PrivateLogName = 'tpt_test_log',
VARCHAR TdpId = 'edwtdpcop1',
VARCHAR UserName = 'tpt_user',
VARCHAR UserPassword = 'TPTpass',
VARCHAR SelectStmt = 'SELECT CAST(
TRIM(CAST(CITY_CD as VARCHAR(3))) || '|' ||
trim(CAST(CITY_NM as VARCHAR(50))) || '|' ||
trim(CAST(CNTRY_CD as VARCHAR(2))) || '|' ||
trim(CAST(JURSD_CD as VARCHAR(3))) || '|' ||
trim(CAST(SUBZONE_ONE_CD as VARCHAR(3))) || '|' ||
trim(CAST(SUBZONE_TWO_CD as VARCHAR(3))) || '|' ||
trim(CAST(ACCRED_ENTITY as VARCHAR(4)))
DEFINE OPERATOR FILE_WRITER
TYPE DATACONNECTOR CONSUMER
VARCHAR PrivateLogName = 'def_log',
VARCHAR DirectoryPath = '/home/user',
VARCHAR FileName = 'tpttest.csv',
VARCHAR Format = 'DELIMITED',
VARCHAR OpenMode = 'Write',
VARCHAR TextDelimiter = '|',
APPLY TO OPERATOR (FILE_WRITER)
SELECT * FROM OPERATOR (SQL_SELECTOR);