FastExport for Really Short Queries in Teradata 13.10

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

The FastExport utility is usually seen as a batch-oriented job to use when you need to return large numbers of rows from the database to a user.   It is the ideal tool for efficiently and quickly returning large answer sets, sorted or otherwise, back to the client from the database.   That’s why it was named the way it was:  “FastExport”.  

It might surprise you that FastExport  can also be used against small data volumes.  And when it is, Teradata 13.10 includes an important enhancement that gives you a more efficient way to get those rows back quickly.

Utilities generally come with some small start-up overhead that disappears pretty quickly as the data volume processed goes up.  But this start-up overhead can be annoying, an often unnecessary, when you are processing very small volumes. 

How FastExport works 

FastExport was designed to get big data volumes back to the client quickly.  Here’s how it does that:   After the final spool file has been built, the result rows will be redistributed across all AMPs, so that each AMP will have approximately the same number of data blocks to return.  There is always a horizontal redistribution, which is designed to speed up the export of a large answer set.  In addition, there will be a preceding vertical redistribution if ORDER BY is specified.

During the vertical redistribution process, additional AMP worker tasks are spawned on each AMP to receive rows, rows are sampled, a vertical redistribution map is built, and the rows are relocated based on this map. Horizontal redistributions, which happen whether or not vertical redistributions have taken place, pack rows from the spool file of each AMP into response blocks and ensure they are evenly spread across all AMPs for efficient return to the client. 

The high level execution flow of FastExport includes these steps:  

  1. FastExport client establishes connections to the DBS.
  2. FastExport client executes the SELECT statement(s), and the results are stored in the final spool file.
  3. The  result rows from the SELECT statement are redistributed evenly and stored in spool files on all AMPs.  If the SELECT includes an ORDER BY clause, these rows are redistributed vertically first.  Then, the rows are redistributed horizontally.  At the end, response spool files on all AMPs should have roughly the same number of data blocks.
  4. FastExport client retrieves result data blocks directly from the AMPs through the FastExport sessions.  

When using FastExport, locks are released before the export process begins.

Enhancements to FastExport in Teradata 13.10

There’s a change in the 13.10 release of the Teradata Database for FastExport to help performance against small volumes.  This enhancement includes a “No Spool Mode” option, appropriate when just a few rows are going to be processed from a single table.  Common usage for no spool mode is primary or secondary index access.

The No Spool Mode will prevent final spool files from being redistributed across all AMPs.  In-memory buffers will be used to store the final answer set rows, rather than spool files.  Only the AMPs that actually process rows will be involved in final processing, which will reduce unnecessary AMP-level overhead when the number of AMPs in the configuration grow.

The no spool option is appropriate only for simple selects, that do not contain joins.  There are several benefits to this enhancement:

  • There is no need to build an export spool file
  • Only one pass is made over the data
  • Export of blocks begins immediately

If you wish to take advantage of this enhancement, the TPT Export Operator has a new optional attribute called SpoolMode.   If you don’t chose to use this option, your FastExport job will use the traditional spooling approach described above.   If you specify NoSpool as your spool mode, then no spooling will take place as long as the SELECT statement supports that approach.   If the query includes joins, a GROUP BY or ORDER BY, or a statistical function step, the no spool option is ignored.

You can read more about the No Spool option in the Teradata FastExport Reference manual or the Teradata Parallel Transporter Reference manual.

14 Comments
N/A
Carrie I am a big fan of your blogs'.

In Teradata V2R6 when I was using Fast Export for my queries inorder to generate delimited files there is always junk or garbage values coming in the first field.

There was some work around done using Unix scripts to handle these things.

I came to know there was a limiitation in this utility (Not sure if my source of information is correct or not:) ).

If this is the case then does TD 13 Version handled this shortcoming.
Teradata Employee
Hi Vicks,

Thank you for the kind words.

I'm not entirely sure, but you may be talking about a feature in the older version of FastExport that allowed you to generate VARTEXT records format files, where each record has variable length field values separated by a delimiter. I am told that the current version of FastExport does not support exporting data in VARTEXT format at this time. If you are not planning on trying to use VARTEXT format, you should be able to use FastExport just fine. Please check in the documentation to validate capabilities of FastExport in Teradata 13.0 just to be sure.

Thanks, -Carrie
HI Carrie,

In FastExport with Spool , Is there a way to calculate the spool used by FastExport job based on exported/extracted data size. I heard that it is usually double the size of extracted data (ex: if extracted data is 40GB , the spool used by fastexport is nearly 80GB) ....is it correct ...? If it is ,Could you please explain me how this calculation has done.
Teradata Employee
Hi Sravani,

I’m not aware of any formula for calculating FastExport spool usage. My suggestion is to use the EXPLAIN to get spool estimates for the SELECT which may include intermediate spools. Then, add the response spool of the horizontal redistribution step which is probably the same as the final response spool of the SELECT.

I have not tried this so I don't know if this will be close to the actual spool usage, but it's the only suggestion I can come up with.

Thanks, -Carrie
Enthusiast
Carrie, Thanks for the explanation on how FastExport works. This is the first detailed description of Fast Export that I have been able to find and it certainly helps to clarify things. I do have one question. Given the overhead of the utility startup, additional sessions and the possibility of utility throttles, in what situation would it be better to use the NOSPOOL option with a small answer set, over using a regular SELECT (non-utility), such as Ab Initio API mode?

Thanks, -Jim
Teradata Employee
Hi Jim,

FastExport does indeed have a higher setup overhead than regular SQL. Given a choice, standard SQL is probably a better choice when returning few rows. But if you are caught in a situation where FastExport is mandated for use within a specific application, the NOSPOOL option will be a big help for the small queries.
Teradata Employee
Hi,
Does it benefit a small volume? But not a large volume?
Have you ever recommended a number?
Teradata Employee
FastExport with NoSpool option is only for requests returning a small number of rows. Spooling is a better choice for moderate or large volumes because it can ensure the response spoools are evenly distributed across all the AMPs, providing better efficiency when the answer is being returned.

What number of response rows is suitable for NoSpool is hard to say, as it will depend on the number of AMPs in the configuration. However, it will only be an option if the request is a simple select (no aggregations, no joins, no OLAP functions, no multi-statement requests). If you have a query like that, you could try seeing if it performs better using FastExport with NoSpool or the default spooling.

However, I would expect you would get even better performance for simple queries with small answer sets if you use standard SQL. So consider making SQL your first choice, and only consider using FastExport with NoSpool if you are not able to use standard SQL in that particular situation.

Thanks, -Carrie
Teradata Employee
Thank you. I'll check when testing. And also we'll find it optimal number. Here is 6650H 2 + 1, 144 AMP.
Teradata Employee

Hello Carrie,

Sorry to be asking question on an old post. I am a little unclear about how FastExport works when SPOOL is involved. You mentioned that the final spool is distributed evenly to all AMPs. FastExport then uses fastexport sessions to retrieve data from all AMPs. How does FastExport retrieve data from all AMPs if the number of sessions defined for the job are less than the number of AMPs? Say for an example, if there are 100 AMPs and 4 sessions have been defined for the job then does FastExport retrieve all data from first group of 4 AMPs before it moves on to the next 24 such groups?

Thanks as always!

Teradata Employee

Padhia,

The sessions access data in a round robin fashion across the AMPs, getting one block of rows at a time from each AMP, then moving on to the next AMP.   The client knows how to make sense of the returned data block and unpack the individual rows.  You could use only one session to accomplish this return of data, or several sessions.

The purpose of using more than one session is that then the client can do multi-buffering and can have more than one fetch outstanding at a time.  This usually speeds up the return of large volumes of data.  

If one session were active, it would fetch blocks in round robin manner across all AMPs by itself.  If two or more sessions are active, then those sessions can interleave their fetch activity so more total fetching can be done, usually in a shorter period of time.

Thanks, -Carrie 

Enthusiast

Hi Carrie,

Can we export the output of multiple select statements specified with in .begin export and .end export to multiple outfiles ?

If we create multiple .begin export and .end export blocks with in a single fast export job, are they executed in parallel or sequencially ?

Thanks

ambuj

Teradata Employee

Ambuj,

I am not able to help you with this question as I am not experienced with FastExport coding conventions.  I'd suggest you post your question on Teradata Forum at:  http://forums.teradata.com/forum  or see if there are explanations in the Teradata FastExport Reference Manual that help answer your questions.

Best Regards, -Carrie

Enthusiast

Carrie,

I have read the complete fastexport manuall but could not find answer to my query.

Anyways i will post it in a separate thread in forum.

Thanks,

Ambuj