SELECT TOP N - odd behaviour (v2r6)

Database
Enthusiast

SELECT TOP N - odd behaviour (v2r6)

Hello,

"SELECT TOP n from table" can be very fast, even on large tables, and is showing an odd behaviour on other tables,
with a three steps explain .. to get 10 records in one minute !

How to understand the first all-rows scan in step below ?

3) We do an all-AMPs RETRIEVE step from "tablename" by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps)
(compressed columns allowed), which is built locally on the AMPs.
The input table will not be cached in memory, but it is eligible
for synchronized scanning. The result spool file will not be
cached in memory. The size of Spool 2 is estimated with high
confidence to be 204,849,278 rows. The estimated time for this
step is 1 minute and 24 seconds.
4) We do an all-AMPs STAT FUNCTION step from Spool 2 by way of an
all-rows scan into Spool 5, which is built locally on the AMPs.
The result rows are put into Spool 1 (group_amps), which is built
locally on the AMPs. This step is used to retrieve the TOP 10
rows. Single AMP optimization is used. If this step retrieves
less than 10 rows, then execute step 5.
5) We do an all-AMPs STAT FUNCTION step from Spool 2 (Last Use) by
way of an all-rows scan into Spool 5 (Last Use), which is built
locally on the AMPs. The result rows are put into Spool 1
(group_amps), which is built locally on the AMPs. This step is
used to retrieve the TOP 10 rows.
6) Finally, we send out an END TRANSACTION step to all AMPs involved ...
Tags (1)
8 REPLIES
Enthusiast

Re: SELECT TOP N - odd behaviour (v2r6)

I am no expert here, I am just learning the "Teradata way", but I think this is repartitioning your data, doing an all AMP scan of all rows and and building a spool file on each of the amps, then doing a local all rows scan to locate the top ten on each amp then passing back to the PE?
Enthusiast

Re: SELECT TOP N - odd behaviour (v2r6)

It looks like you need to collect stats on the table.

Enthusiast

Re: SELECT TOP N - odd behaviour (v2r6)

@rluebke - Could you expand on this a little? what clue in the Explain do you see that leads you to think it needs more up to date Stats? I saw that it read 209 million rows, so it must have stats of some description?

@lucas could you post the query and any info on table? if it does not contravene confidentiality etc?
Enthusiast

Re: SELECT TOP N - odd behaviour (v2r6)

Well,
i can report hereafter the comparison between "table_1" and "table_2" :
select TOP 10 * on the biggest one ("table_2) returns 10 rows in 1 second !
Statistics are collected and up to date for those two tables.

Table_1 (MULTISET, PI with many duplicates on PI) 205 893 673 rows
Table_2 (MULTISET, PI & PPI & NUSI) 2 988 182 783 rows

Table_1 EXPLAIN
1) First, we lock a distinct DWH_SOCLE."pseudo table" for read on a
RowHash to prevent global deadlock for "table_1".
2) Next, we lock "table_1" for read.
3) We do an all-AMPs RETRIEVE step from "table_1" by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps)
(compressed columns allowed), which is built locally on the AMPs.
The input table will not be cached in memory, but it is eligible
for synchronized scanning. The result spool file will not be
cached in memory. The size of Spool 2 is estimated with high
confidence to be 205,843,087 rows. The estimated time for this
step is 1 minute and 25 seconds.
4) We do an all-AMPs STAT FUNCTION step from Spool 2 by way of an
all-rows scan into Spool 5, which is built locally on the AMPs.
The result rows are put into Spool 1 (group_amps), which is built
locally on the AMPs. This step is used to retrieve the TOP 10
rows. Single AMP optimization is used. If this step retrieves
less than 10 rows, then execute step 5.
5) We do an all-AMPs STAT FUNCTION step from Spool 2 (Last Use) by
way of an all-rows scan into Spool 5 (Last Use), which is built
locally on the AMPs. The result rows are put into Spool 1
(group_amps), which is built locally on the AMPs. This step is
used to retrieve the TOP 10 rows.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.

Table_2 EXPLAIN

1) First, we lock a distinct DWH_AG."pseudo table" for read on a
RowHash to prevent global deadlock for "table_2".
2) Next, we lock "table_2" for read.
3) We do an all-AMPs STAT FUNCTION step from "table_2"
by way of an all-rows scan with no residual conditions into Spool 5,
which is built locally on the AMPs. The result rows are put
into Spool 1 (group_amps), which is built locally on the AMPs.
This step is used to retrieve the TOP 10 rows. Single AMP
optimization is used. If this step retrieves less than 10 rows,
then execute step 4.
4) We do an all-AMPs STAT FUNCTION step from "table_2"
by way of an all-rows scan with no residual conditions into Spool 5
(Last Use), which is built locally on the AMPs. The result rows
are put into Spool 1 (group_amps), which is built locally on the
AMPs. This step is used to retrieve the TOP 10 rows.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.

Thanks for any suggestion

Pierre
Junior Contributor

Re: SELECT TOP N - odd behaviour (v2r6)

Hi Pierre,
you should open an incident with your Teradata support.
If it's a plain "select top 10 * from table;", without ORDER/JOIN/WHERE that RETRIEVE-step should not happen.

Dieter
Enthusiast

Re: SELECT TOP N - odd behaviour (v2r6)

Hi Dieter,
yes it is a plain "SELECT TOP 10 * FROM table;"
i didn't mentionned that PI of "Table_1" is a 5 columns index (skewfact=0), when PI for "Table_2" is a 2 columns index. But this not a clue ...

Pierre
Teradata Employee

Re: SELECT TOP N - odd behaviour (v2r6)

Seems like a bug if it is a simple select TOP from table.
Enthusiast

Re: SELECT TOP N - odd behaviour (v2r6)

Hi Dieter,

I do face the same problem when run a select on a Big Table with "TOP" clause.

The explain says "2) Next, we do an all-AMPs STAT FUNCTION step from ...."

I have observed that when I do a select like

"SELECT TOP 100 FROM table1;

It is taking a long time too to process and delays the entire process.

when I removed the TOP 100, it started working faster.

Is the TOP 100 is the clause invokes the STAT FUNCTION ?

On huge tables, why it delays?

Thanks,

Mani