distinct vs group by, insert vs create as

UDA
Enthusiast

distinct vs group by, insert vs create as

Hi

From a processing/ efficiency point of view, looking at two questions below, which method in each question is faster?

Consider this for an average sized table(whatever you think that is ha ha!)

CONSIDER ALL OTHER FACTORS, INDEXES ETC THE SAME.

Basically I simply want to know if there is a difference in speed with the way teradata handles each problem.

1) Which is faster - Using DISTINCT or GROUP BY statement for the example below

Sel DISTINCT NAME, DOB, AGE
from table1

or

Sel NAME, DOB, AGE
from table1
group by 1,2,3

Which is faster/ more efficient (for an average sized table)?

===========

2) Which is faster - Using INSERT or CREATE AS statement for the example below

CREATE VOLATILE TABLE table1
,NO FALLBACK
,NO BEFORE JOURNAL
,NO AFTER JOURNAL
,CHECKSUM = DEFAULT
(
NAME VARCHAR(20)
,DOB DATE FORMAT 'YY/MM/DD'
, AGE VARCHAR(3)
) PRIMARY INDEX (NAME) ON COMMIT PRESERVE ROWS
;

INSERT INTO table1
SEL
NAME
, DOB
, AGE
FROM TABLE2

or

CREATE VOLATILE TABLE table1 AS
(
SEL
NAME
, DOB
, AGE
FROM TABLE2
) WITH DATA
PRIMARY INDEX (NAME)
ON COMMIT PRESERVE ROWS
;

Which is faster/ more efficient (for an average sized table)?
6 REPLIES
Enthusiast

Re: distinct vs group by, insert vs create as

Hi,

Just check out the Explain plan for the group by and distinct SQL query.You ill see distinct takes more time then Group by caluse.

We could see if the table size is medium then it doesn't make such diffrence. But if the table size is large then ALWAYS USE GROUP BY CLAUSE INSTEAD OF USING DISTINCT.

Coming to the second one...

I don't think it will take much time while creating the tables because anyways it will take the definition from the data dectionary tables.In creation time the diffrence will not be as much.

But while inserting the data as per your 1st query will take much time then the second one.Reason being you insert statment in create table have to cross verify the defintion only once rather then verifying every time in the first query.

Regards
Rajeev T
Senior Apprentice

Re: distinct vs group by, insert vs create as

1.
As Teradata (before TD13) optimizes GROUP BY and DISTINCT totaly different, there's a rule of thumb:
If the number of rows/value is large, GROUP BY is faster, because duplicates are eliminated locally *before* redistribution.
If the number of rows/value is small, DISTINCT is faster.

e.g. 10,000,000 rows, 100 AMPs

50 values, 200,000 rows/value
DISTINCT: 10,000,000 rows redistributed to 50 AMPs (large spool), then sorted eliminating duplicate rows.
GROUP BY: 100,000 rows/AMP locally aggregated and reduced to 50 rows/AMP, 50*100 = 5000 rows redistributed to 50 AMPs (small spool) and aggregated again.

500,000 values, 20 rows/value
DISTINCT: 10,000,000 rows redistributed to 100 AMPs (large spool), then sorted eliminating duplicate rows.
GROUP BY: 100,000 rows/AMP locally aggregated, but there's probably only 1 row/value on each AMP, so it's almost useless. 100,000*100 = 10,000,000 rows redistributed to all AMPs (large spool) and aggregated again.

10,000 values, 1,000 rows/value
DISTINCT: 10,000,000 rows redistributed to 100 AMPs (large spool), then sorted eliminating duplicate rows.
GROUP BY: 100,000 rows/AMP locally aggregated and reduced to 10,000 rows/AMP, but aggregate cache (~100KB) will overflow and there will be a second local aggregation, 10,000*100 = 1,000,000 rows redistributed to all AMPs (medium spool) and aggregated again.

This is for even distribution, if it's skewed, this may be different...

Btw. if you use a subquery there's always a distinct by default, but you can specify GROUP BY to override it.

TD13 will automatically rewrite DISTINCT to GROUP BY and apply some new algorithms to get faster response.

2.
You'll notice no difference in speed, just test it.

Dieter
Enthusiast

Re: distinct vs group by, insert vs create as

Thanks for the response. I have one further question regarding the CREATE TABLE question and the use of UNION.

Should I use two insert statements or use UNION like below? Is there any difference in speed in this scenario?

Create table2 as
(
sel * from table1
union
sel * from table3
) with data on commit preserve rows
;

Thanks Chris
Enthusiast

Re: distinct vs group by, insert vs create as

Dieter,

              Thanks for your explantion. It is very Articulate. I have a question here. What do you mean by locally aggregated in your explanation ?  It still be the perm space. is nt it ?

Thanks 

Jana

Enthusiast

Re: distinct vs group by, insert vs create as

Locally Aggregated : Basically without data movement or inside the amp.You need collect all data in spool file then eliminitate duplicates.

Dieter -Thanks for the update on TD13 ,wasn't aware of that.

Enthusiast

Re: distinct vs group by, insert vs create as

@barcoo - Have you compared the EXPLAIN plans of the UNION in a CREATE TABLE AS vs CREATE TABLE and INSERT INTO? Have you tested it yet?