Sampling implementation

Database
Enthusiast

Sampling implementation

Does a query sampling a certain number of rows from a table, such as the following, really scan all rows? The information from Explain below seems to indicate that all rows are scanned.

Or, does teradata implement the scan in a more efficient manner skipping rows? I appreciate any information on this. The table is not large enough for me to understand the difference by running different queries which force a scan.

explain

select * from test_table_2_81 sample 10;

 *** Help information returned. 13 rows.

 *** Total elapsed time was 1 second.

Explanation

---------------------------------------------------------------------------

  1) First, we lock a distinct TEST_SCHEMA."pseudo table" for

     read on a RowHash to prevent global deadlock for

     TEST_SCHEMA.test_table_2_81.

  2) Next, we lock TEST_SCHEMA.test_table_2_81 for read.

  3) We do an all-AMPs SAMPLING step from

     TEST_SCHEMA.test_table_2_81 by way of an all-rows scan

     with no residual conditions into Spool 1 (group_amps), which is

     built locally on the AMPs.  Samples are specified as a number of

     rows.

  4) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.

2 REPLIES
Senior Apprentice

Re: Sampling implementation

No, it's not actually doing a Full Table Scan, only some "random" blocks are read.

But the speed of execution is dependent on the number of AMPs in the system, more AMPs mean slower execution.

On a system wtih a few hundred AMPs it will run for a few seconds even if the table is small (and this speed hardly changes for larger tables).

If you only want to see some rows, but don't need a statsitical sample, better use TOP syntax.

Dieter

Enthusiast

Re: Sampling implementation

thanks.

-venky