Exporting CSV from TPT

Tools
Enthusiast

Exporting CSV from TPT

Hi,

I have installed FastExport but apparently it cannot output CSV files.

So now I'm trying to get ahold of Teradata parallel transporter, to export a large table (hundreds of millions of rows).

I've read this post and the quickstart guide for TPT, but I still don't know how to export a CSV from a table.

Could someone please provide a sample script that connects to a database with username/password and saves a table as a .CSV file?

thank you,

imran

Tags (1)
7 REPLIES
Enthusiast

Re: Exporting CSV from TPT

I just changed the select statement in the example file to my own table, and I get this error:

 

ubuntu@home:/opt/teradata/client/15.10/tbuild/sample/quickstart$ tbuild -f qstart2.txt -v jobvars2.txt -j qstart2

Teradata Parallel Transporter Version 15.10.00.04 64-Bit

Job log: /opt/teradata/client/15.10/tbuild/logs/qstart2-1.out

Job id is qstart2-1, running on spark-analytics

Teradata Parallel Transporter Export Operator Version 15.10.00.04

Teradata Parallel Transporter DataConnector Operator Version 15.10.00.04

$EXPORT: private log specified: exportprivate.log

$FILE_WRITER[1]: Instance 1 directing private log report to 'dtacop-ubuntu-14018-1'.

$FILE_WRITER[1]: DataConnector Consumer operator Instances: 1

$FILE_WRITER[1]: ECI operator ID: '$FILE_WRITER-14018'

$FILE_WRITER[1]: Operator instance 1 processing file 'flatfile2.dat'.

$EXPORT: connecting sessions

TPT_INFRA: TPT02639: Error: Conflicting data type for column(4) - "dt". Source column's data type (CHAR) Target column's data type (INTDATE).

$EXPORT: TPT12108: Output Schema does not match data from SELECT statement

$EXPORT: disconnecting sessions

$EXPORT: Total processor time used = '0.04642 Second(s)'

$EXPORT: Start : Sat Apr  9 00:57:08 2016

$EXPORT: End   : Sat Apr  9 00:57:11 2016

$FILE_WRITER[1]: Total files processed: 0.

Job step MAIN_STEP terminated (status 12)

Job qstart2 terminated (status 12)

Job start: Sat Apr  9 00:57:06 2016

Job end:   Sat Apr  9 00:57:11 2016

 

THis is my qstart2.txt:

DEFINE JOB qstart2

(

  APPLY TO OPERATOR ($FILE_WRITER)

  SELECT * FROM OPERATOR($EXPORT);

);

 

This is my jobvars2.txt (with credentials replaced):

 SourceTdpId          = 'db.acmecorp.com'

,SourceUserName       = 'myusername'

,SourceUserPassword   = 'mypass'

,DDLPrivateLogName    = 'ddlprivate.log'

,ExportPrivateLogName = 'exportprivate.log'

,TargetErrorList      = ['3807']

,TargetFileName       = 'flatfile2.dat'

,TargetFormat         = 'delimited'

,TargetTextDelimiter  = ','

,TargetOpenMode       = 'write'

,SelectStmt           = 'SELECT * FROM rtl.base;'

Senior Apprentice

Re: Exporting CSV from TPT

This is probably related to the session's DateForm setting, ansiDate returns a 10-character string 'yyyy-mm-dd'  and integerDate returns the internal 4-byte integer.

Try adding a new job variable:

ExportDateForm = 'IntegerDate'

Enthusiast

Re: Exporting CSV from TPT

I was able to get it working by changing it to:

dt = ANSIDATE

It runs, but it's very slow (75,000 rows x 2 columns imported/minute).  I have an 800 million row table.

I've tried increasing the number of MaxSessions, and all the other instance variables, but it has no impact on the speed.

I don't think it's a network issue, as I'm able to download files from the Internet at 65 MB/s on the same network.

Can someone please recommend how I can speed this up?  It says in the documentation somewhere that SQL selects are slow compared to exports?  How do I do an export without a SQL select?

Senior Apprentice

Re: Exporting CSV from TPT

You submitted a SQL Select, but used the FastExport protocol, so this is the fastest possible way.

Changing the number of sessions might not help at all, because this is usually set/adjusted by Workload Management.

You need to check the logs: tlogview -j yourjobid -f  "*"  -g where yourjobid is the one returned when you start the job: "Job id is qstart2-1"

There's lots of information returned, you need to look at the FILE_WRITER details.

75,000 rows per minute is extremely slow, I just tried the QuickStart script & exported 3,000,000 rows to a 130MB flat file from a local TD VM to another local Windows WM and the whole TPT job run in < 10 seconds, < three seconds for the FileWriter part.

Enthusiast

Re: Exporting CSV from TPT

Thanks Dieter

Here's my log: http://pastebin.com/bXatXwKk

What's interesting is that it says the query itself takes 2-3 seconds, but the time from when it starts to when it finishes is close to 4 minutes.  What could be going on?

regards,

imran

Senior Apprentice

Re: Exporting CSV from TPT

I don't have expertise in reading detailed TPT logs, better wait for Steve Feinholz :-)


Sending the data was only 6 seconds:

 

**** 15:53:11 Select execution completed. 5850 data blocks generated.

**** 15:53:17 Finished sending rows

 

 

Your system is huge, 5850 AMPs, and you request way too many sessions, one per AMP (I didn't know this is possible for an Export):

**** 15:52:56 Maximum number of special sessions requested: All

**** 15:52:56 Minimum number of special sessions required: 1

Instance Assigned Connected Result
======== ======== ========= ======================
1 2925 299 CLI Error 301
2 2925 300 CLI Error 301
======== ======== ========= ======================
Total 5850 599 Warning

 

Logging on 599 sessions runs for more than 2 minutes and logoff was another 40 seconds (there's a TASMSessionLimit event, but it doesn't seem to limit the number).

 

 

Try a single instance and a few sessions, for 290,413 rows a single session should be enough: ExportMaxSessions = 1

When your export is larger you will still use only a few sessions.

You should ask your DBAs, they should know about the settings for TPT Exports to achieve good performance on your system.

Teradata Employee

Re: Exporting CSV from TPT

The default for the Export operator is 1 per available AMP.

Unlike FastExport that defaults to 4.

(We could not have a default of '4' when we support multiple instances, and all of the operators are now consistent with each other.)

The job in question specifies 2 instances, and so ExportMaxSessions needs to be at least 2.

But, as Dieter noted, since you only need 1 (or maybe 2) sessions, you only need 1 instance of the Export operator.

If you were claiming that the entire job took a long time, it looks (to me without seeing the whole log) as if the session connects took up most of the time.

-- SteveF