Use of order by

Database
Enthusiast

Use of order by

We have created 1 table with following structure

CREATE SET TABLE prakhar2 ,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL,

CHECKSUM = DEFAULT

(

c INTEGER,

f INTEGER,

d DATE FORMAT 'yyyy-mm-dd',

e DATE FORMAT 'yyyy-mm-dd')

PRIMARY INDEX ( c );

But when we are executing below queries result sets are coming different ,we are bit confused by use of ORDER BY clause.

Please help .

1) sel * from prakhar2

c f d e

1 2 01/02/2008 01/01/2008

1 3 01/02/2008 12/31/9999

1 4 01/02/2008 12/31/2007

1 5 01/02/2008 01/22/2008

2) sel * from prakhar2 order by d

c f d e

1 4 01/02/2008 12/31/2007

1 5 01/02/2008 01/22/2008

1 3 01/02/2008 12/31/9999

1 2 01/02/2008 01/01/2008

3) sel * from prakhar2 where f>2 order by d asc.

1 5 01/02/2008 01/22/2008

1 4 01/02/2008 12/31/2007

1 3 01/02/2008 12/31/9999

4) sel * from prakhar2 where f>2

1 3 01/02/2008 12/31/9999

1 4 01/02/2008 12/31/2007

1 5 01/02/2008 01/22/2008

Not able to understand how order by is working in Teradata.
The column on which i am ordering is having all equal values
Please help as this is blocking many of my operations

9 REPLIES
Senior Apprentice

Re: Use of order by

Hi prakhar,
accordign to Relational Data Model/Standard SQL an answer set is an unordered set of rows unless you specify ORDER BY.

Teradata is a parallel DBMS based on hashing, so there's no built-in sequential ordering of any kind as other DBMSes might provide.

"The column on which i am ordering is having all equal values"
Then why do you order by it?

Simply ORDER BY d,e and everything is fine.

Dieter
Enthusiast

Re: Use of order by

Hi
That is the case with me .......if column wud have been different even i know how it is going to order .Plaease provide some valid explanation as this sequence is affectiong my work
Teradata Employee

Re: Use of order by

Hello,

In case of specifying single column in ORDER BY, it orders the column values of a single column, regardless of other column values. If you apply ORDER BY on a column containing 1, it will always be 1 and hence no purpose of using ORDER BY (but only if you expect changed values other than 1).

So, as suggested by Dieter, if you need to order by more than 1 column (or all), please specify them in the ORDER BY clause, otherwise, it will return data just the way you asked for it, not by the way you need it.

Regards,

Adeel
Enthusiast

Re: Use of order by

Hi Adeel,
You have correctly explained your point but all the time i am running this query

2) sel * from rtwit_work.prakhar2 order by d

c f d e
1 4 01/02/2008 12/31/2007
1 5 01/02/2008 01/22/2008
1 3 01/02/2008 12/31/9999
1 2 01/02/2008 01/01/2008

I am getting this result set only....

I was thinking it will go for primary index col if col specified in order by has equal value ...
I am not getting the point y this sequence is chosen by teradata when both order col and primary index col are equal...
I it something to do with rowid???collateral sequence.....
the way i entered rows in table are
1)row with f=2....>f=3...f=4...f=5
but results just are out of scope....
Teradata Employee

Re: Use of order by

Again....how can you expect the result to be ordered on something, without mentioning it!

As per my knowledge, it returns the rows on the AMP basis, it has nothing to do with rowid or the sequence you inserted those rows or anything else, following example may clear my point:

Suppose you have a Table1 with columns Col1, Col2, Col3 and the system is configured with 4 AMPs (AMP1, AMP2, AMP3, AMP4), AMP3 is fastest AMP, and AMP2 is slowest, AMP1 and AMP4 are in middle.

Now at time T1, you issue following query:

SELECT * FROM Table1;

this is what will happen:

T1 + x: AMP3 will start returning/putting rows in spool first
T1 + x + y: AMP1 and AMP4 will start on second
T1 + x + y + z: and AMP2 will start the latest from all AMPs

T2: AMP3 finishes the task
T2 + x: AMP 1 and AMP4 finishes
T2 + x2: AMP2 finishes

Now you will have this everytime you run a query, because without adding/deleting/changing more rows i.e. changing data demegraphics, you can't slow down AMP3 or make AMP2 fast. Which means same rows every time you execute the query. This also depends on the current load of the system. In this scenario we also supposed that nothing is happening in the world but what we are doing.

Now, what happens with a query having ORDER BY? Following may be helpful:

Keep all the same assumptions as above, just change the query as follows:

SELECT * FROM Table1 ORDER BY Col1;

Now if you suppose that Col1 has only 1 distinct value i.e. 1 it will be same as above, logically, but the rows "may or may not" be in same order, but if the column contains more values, it will sort them first in spool.

HTH!

Regards,

Adeel
Enthusiast

Re: Use of order by

Hi Adeel,
Gr8 explanation Sir.......
But the point that confused me is that here all rows have same nupi that means they will be mapped to same AMP so whether question of which AMP is faster and all still exists!!!!! can you clarify even if it is a stupid question.....

Only point i was asking that what factors drive the resulset when all the rows in a column defined in order by clause has same values....
Teradata Employee

Re: Use of order by

This time without going into details, the data read first will be sent first if it satisfies the given condition/order.

Regards,

Adeel
Senior Apprentice

Re: Use of order by

All rows with the same PI-value will be stored on the same AMP sequentially in the order of inserts.
If there's no ORDER BY the rows are returned in that order.

An answer set is returned *after* it's fully created (other DBMSes start returning rows as soon as they're found).
Each AMP sends blocks to the BYNET, which merges them (using round-robin).
You can watch that when you submit:
select hashamp(hashbucket(hashrow(pi_col))) from tab;

This will result in:
1
2
3
4
5
6
...
1
2
3
4
5
6
...

If you ORDER BY each AMP sorts it's part of the answer set, followed by a merge sort via BYNET.
Apparently the optimizer doesn't skip that sort and the sort algorithm used for the AMP-local sort does not preserve to original order (unstable).

If you want the same 'order' for all those 4 queries then don't specify any ORDER BY.
But it's still no guaranteed order :-)

Dieter
rgs
Enthusiast

Re: Use of order by

If there is no ORDER by clause all rows with the same PI will end up in the response spool in the same order as they are in the table. There is no implied order in the table, but of course physically it has to be put in there in some manner, which could be considered as some kind of order. But you can’t rely on that in your design.

When you add the ORDER by clause it will generate a response spool as in the other case with a sort field added in there separate from the data columns. The sort logic knows nothing of the order of the rows. It just looks at the sort key which in your example is the same. So the sort is free to put rows with identical keys in any order. Unless you know the internals of the sort algorithm you can’t expect it to keep rows having identical key values in some sort of order. There is no ‘keep the rows in the same order as they are read if the keys are the same’ rule in the sort logic, therefore it can output them in any order since there is no implicit order in a set in the first place.