Which is better GROUP BY, DISTINCT or SET table

General
Enthusiast

Which is better GROUP BY, DISTINCT or SET table

I want to know which one is better to handle duplicate records ; is it using GROUP BY or DISTINCT while selecting data from Source table or Creating Target table as SET table ?

Thanks in advance.

Nitin 'Raj' Srivastava

7 REPLIES
Senior Supporter

Re: Which is better GROUP BY, DISTINCT or SET table

classical answer for an unspefic question ;-> - it depends.

Can you share more information on the process. How many rows, how many dups etc. Where do the dups come from?

But even if you give this info - the answer it will be difficult to judge as the system configuration can also make a difference. So running some tests on your environment will give you best answers.

Dedup can become an expensive operation should be done only once and as early as possible. Lassy design can result in a lot of group by and distinct usage and will consume huge amount of resources...

Enthusiast

Re: Which is better GROUP BY, DISTINCT or SET table

It again depends on the number of rows per value present in the table.

GROUPBY can be used if the table contains more duplicate rows per value and DISTINCT incase of less duplicate rows per value.

Stalin  

Enthusiast

Re: Which is better GROUP BY, DISTINCT or SET table

Creating a SET table in the target makes the duplicate check for the entire row. Using GROUP BY or DISTINCT in the selecting data for your source tables does the check for the columns specified in your GROUP BY or DISTINCT clause. Both the plans for GROUP BY and DISTINCT are similar but it does make a difference based on the data set.

Re: Which is better GROUP BY, DISTINCT or SET table

I think that SET table are useful to garantee the integrity when you know that you gonna have a poor ETL development. Considering only the performance objective, it depends. You should test your case.

Re: Which is better GROUP BY, DISTINCT or SET table

Is it? Well, I might as well try...

Teradata Employee

Re: Which is better GROUP BY, DISTINCT or SET table

Performance varies depending on the data.

DISTINCT

is better when the data is nearly unique. actually when distinct is used; the intermediate spool is sorted and discards the duplicates.

GROUP BY

is better when data has relatively few unique values and works by performing an AMP local grouping operation and then merging the partial result sets for final processing.

Senior Supporter

Re: Which is better GROUP BY, DISTINCT or SET table

also check 13.0 release summary (see attachement).

Optimzer is getting smarter and sometimes distinct and group by result in the same plan.