Help Request: Extracting Data with TPT

Tools & Utilities
Highlighted

Help Request: Extracting Data with TPT

Use case: Need to get a large number of columns (1,000 plus) and a potentially large number of rows (in the millions) extracted from Teradata to a delimited flat file.

TPT Version: 16.20

Teradata Cloud Instance Version: 15.10

 

I've gone through the User Guide and Reference and was able to come up with the below scripts and jobvars. One script using SELECTOR and the other using EXPORT. I was getting really poor performance from the EXPORT until I changed ExportInstances to 4. But even with that change, the time difference between the two scripts is negligible. I have a subset of the data I am testing with and it is 1063 colums (primarily integers) and about 380k rows. Extract time is running about 45 min and the resulting dataset is around 850MB.

 

  • I've tried setting ExportSpoolMode to 'NoSpoolOnly' with minimal increase in performance.
  • I tried setting my WriterInstances to 4 thinking each export would right to a different file at the same time thus quadrupling my effort, but the script only wrote to the one file.
  • If I set my ExportInstances higher that 4, I get an error. Is this parameter limited by the number of cores on the machine I am downloading to, is that why I got an error?

Looking for any help/guidance on where to turn next to get some better performance.

 

jobvars.txt
-----------

/**********************************************************/
/* TPT EXPORT Operator attributes                         */
/**********************************************************/
ExportTdpId              = 'MyInstanceIPAddress',
ExportUserName           = 'MyUserName',
ExportUserPassword       = 'MyPassword',
ExportPrivateLogName     = 'EXPORT_OPERATOR_LOG',
ExportTaceLevel          = 'None',

/**********************************************************/
/* TPT SELECTOR Operator attributes                       */
/**********************************************************/
SelectorTdpId              = 'MyInstanceIPAddress',
SelectorUserName           = 'MyUserName',
SelectorUserPassword       = 'MyPassword',

/**********************************************************/
/* TPT FILE_WRITER attributes                             */
/**********************************************************/
FileWriterFormat           = 'Delimited',
FileWriterTextDelimiter    = '~'
FileWriterQuotedData       = 'No',
FileWriterDirectoryPath    = 'C:\TPT',
FileWriterOpenMode         = 'Write',
FileWriterPrivateLogName   = 'FILE_WRITER_LOG',

/**********************************************************/
/* APPLY STATEMENT parameters                             */
/**********************************************************/
ExportInstances = 4,
WriterInstances = 1,
SelectorInstances = 1


SELECTOR script
---------------

DEFINE JOB EXPORT_DELIMITED_FILE
DESCRIPTION 'Export rows from Teradata to a delimited file'
(

  APPLY TO OPERATOR ($FILE_WRITER()[@WriterInstances] ATTR (
      FileName = 'selector.txt'
      )
  )

  SELECT * FROM OPERATOR ($SELECTOR()[@SelectorInstances] ATTR (
      SelectStmt             = 'select * from <tablename>;',
      ReportModeOn           = 'N'
      )
  );
);


EXPORT script
-------------

DEFINE JOB EXPORT_DELIMITED_FILE
DESCRIPTION 'Export rows from Teradata to a delimited file'
(
  DEFINE SCHEMA EDL002_SCHEMA
  (
      <myschema>
  );

  APPLY TO OPERATOR ($FILE_WRITER()[@WriterInstances] ATTR (
      FileName = 'export.txt'
      )
  )

  SELECT * FROM OPERATOR ($EXPORT(EDL002_SCHEMA)[@ExportInstances] ATTR (
      SelectStmt             = 'select * from <tablename>;'
  )

  );
);
2 REPLIES
Teradata Employee

Re: Help Request: Extracting Data with TPT

Hi jglamere,

 

You should try adding MaxSessions = 32 (randomly choosen number) in your selector attributes.

Also don't forget you'll be bandwith limited between your computer and your Teradata instance.

 

Depending on your bandwith and size of the export, it may be better to generate the file on the same network than the TD instance, zip it and then download it.

Teradata Employee

Re: Help Request: Extracting Data with TPT

The reason for the error is probably because you did not connect more than 4 sessions.

There has to be at least 1 session per instance.

If you want performance, you need a lot of sessions (4 will not cut it).

Having each instance control only 1 session will also not cut it because you are waiting on the DBS.

 

If you want to write to multiple files, then use the -C command line option (uppercase). That will send the data to the file writer instances in a round-robin fashion. By default, we send the data blocks to the main consumer instance first, and only involve the others if the data from the producer operator is faster than the consumers are reading it.

 

If you specified 4 writer instances and you only had data being sent to one file, it means the writer instance was reading faster (or "as fast") than the producers were producing data.

 

-- SteveF