I have installed FastExport but apparently it cannot output CSV files.
So now I'm trying to get ahold of Teradata parallel transporter, to export a large table (hundreds of millions of rows).
I've read this post and the quickstart guide for TPT, but I still don't know how to export a CSV from a table.
Could someone please provide a sample script that connects to a database with username/password and saves a table as a .CSV file?
I just changed the select statement in the example file to my own table, and I get this error:
ubuntu@home:/opt/teradata/client/15.10/tbuild/sample/quickstart$ tbuild -f qstart2.txt -v jobvars2.txt -j qstart2
Teradata Parallel Transporter Version 15.10.00.04 64-Bit
Job log: /opt/teradata/client/15.10/tbuild/logs/qstart2-1.out
Job id is qstart2-1, running on spark-analytics
Teradata Parallel Transporter Export Operator Version 15.10.00.04
Teradata Parallel Transporter DataConnector Operator Version 15.10.00.04
$EXPORT: private log specified: exportprivate.log
$FILE_WRITER: Instance 1 directing private log report to 'dtacop-ubuntu-14018-1'.
$FILE_WRITER: DataConnector Consumer operator Instances: 1
$FILE_WRITER: ECI operator ID: '$FILE_WRITER-14018'
$FILE_WRITER: Operator instance 1 processing file 'flatfile2.dat'.
$EXPORT: connecting sessions
TPT_INFRA: TPT02639: Error: Conflicting data type for column(4) - "dt". Source column's data type (CHAR) Target column's data type (INTDATE).
$EXPORT: TPT12108: Output Schema does not match data from SELECT statement
$EXPORT: disconnecting sessions
$EXPORT: Total processor time used = '0.04642 Second(s)'
$EXPORT: Start : Sat Apr 9 00:57:08 2016
$EXPORT: End : Sat Apr 9 00:57:11 2016
$FILE_WRITER: Total files processed: 0.
Job step MAIN_STEP terminated (status 12)
Job qstart2 terminated (status 12)
Job start: Sat Apr 9 00:57:06 2016
Job end: Sat Apr 9 00:57:11 2016
THis is my qstart2.txt:
DEFINE JOB qstart2
APPLY TO OPERATOR ($FILE_WRITER)
SELECT * FROM OPERATOR($EXPORT);
This is my jobvars2.txt (with credentials replaced):
SourceTdpId = 'db.acmecorp.com'
,SourceUserName = 'myusername'
,SourceUserPassword = 'mypass'
,DDLPrivateLogName = 'ddlprivate.log'
,ExportPrivateLogName = 'exportprivate.log'
,TargetErrorList = ['3807']
,TargetFileName = 'flatfile2.dat'
,TargetFormat = 'delimited'
,TargetTextDelimiter = ','
,TargetOpenMode = 'write'
,SelectStmt = 'SELECT * FROM rtl.base;'
This is probably related to the session's DateForm setting, ansiDate returns a 10-character string 'yyyy-mm-dd' and integerDate returns the internal 4-byte integer.
Try adding a new job variable:
ExportDateForm = 'IntegerDate'
I was able to get it working by changing it to:
dt = ANSIDATE
It runs, but it's very slow (75,000 rows x 2 columns imported/minute). I have an 800 million row table.
I've tried increasing the number of MaxSessions, and all the other instance variables, but it has no impact on the speed.
I don't think it's a network issue, as I'm able to download files from the Internet at 65 MB/s on the same network.
Can someone please recommend how I can speed this up? It says in the documentation somewhere that SQL selects are slow compared to exports? How do I do an export without a SQL select?
You submitted a SQL Select, but used the FastExport protocol, so this is the fastest possible way.
Changing the number of sessions might not help at all, because this is usually set/adjusted by Workload Management.
You need to check the logs: tlogview -j yourjobid -f "*" -g where yourjobid is the one returned when you start the job: "Job id is qstart2-1"
There's lots of information returned, you need to look at the FILE_WRITER details.
75,000 rows per minute is extremely slow, I just tried the QuickStart script & exported 3,000,000 rows to a 130MB flat file from a local TD VM to another local Windows WM and the whole TPT job run in < 10 seconds, < three seconds for the FileWriter part.
Here's my log: http://pastebin.com/bXatXwKk
What's interesting is that it says the query itself takes 2-3 seconds, but the time from when it starts to when it finishes is close to 4 minutes. What could be going on?
Your system is huge, 5850 AMPs, and you request way too many sessions, one per AMP (I didn't know this is possible for an Export):
**** 15:52:56 Maximum number of special sessions requested: All
**** 15:52:56 Minimum number of special sessions required: 1
Instance Assigned Connected Result
======== ======== ========= ======================
1 2925 299 CLI Error 301
2 2925 300 CLI Error 301
======== ======== ========= ======================
Total 5850 599 Warning
When your export is larger you will still use only a few sessions.
You should ask your DBAs, they should know about the settings for TPT Exports to achieve good performance on your system.
The default for the Export operator is 1 per available AMP.
Unlike FastExport that defaults to 4.
(We could not have a default of '4' when we support multiple instances, and all of the operators are now consistent with each other.)
The job in question specifies 2 instances, and so ExportMaxSessions needs to be at least 2.
But, as Dieter noted, since you only need 1 (or maybe 2) sessions, you only need 1 instance of the Export operator.
If you were claiming that the entire job took a long time, it looks (to me without seeing the whole log) as if the session connects took up most of the time.