Collecting Statistics on multiple Columns

Database
Enthusiast

Collecting Statistics on multiple Columns

Hi all,
Is the orde of columns in the collect statistics statemen important or no?
e.g. COLLECT STATISTICS ON table_1 COLUMN (column_2, column_1);
or
COLLECT STATISTICS ON table_1 COLUMN (column_1, column_2);

Thanks in advance
6 REPLIES
Enthusiast

Re: Collecting Statistics on multiple Columns

I do not think it makes any difference. When different column combinations are analyzed, the combined stats are analyzed. Based on this analysis, optimizer estimates the combined cost.
Enthusiast

Re: Collecting Statistics on multiple Columns

Hi Phany
Thanks for the reply.
Does exist a place in the teradata documentation, where it is mentioned?

greetings stami
Enthusiast

Re: Collecting Statistics on multiple Columns

It may be important for covered indexes.
For normal columns, i am not that sure.

Vinay
Junior Contributor

Re: Collecting Statistics on multiple Columns

Hi stami27,
there's no difference, if you collect stats on (col1, col2) or (col2, col1), because both stats are calculated exactly the same.
Simple try it and then use a help stats:
The order is defined by the column position within the create table, this behaviour is similar to a "create index(col1, col2)" or "create index(col2, col1)", both are the same in Teradata (unless you use a order by).

Dieter
Enthusiast

Re: Collecting Statistics on multiple Columns

Hi,

    Can anybody explain, when the collect stats on multi-columns is use??

Teradata Employee

Re: Collecting Statistics on multiple Columns

Optimizer uses multiple columns statistics if WHERE condition references all columns.

If your query looks like:

...

WHERE col1 = 1 AND col2 = 2

you can collect stats this way:

1. COLLECT STATISTICS ON table_name COLUMN (col1,col2);

or

2. COLLECT STATISTICS ON table_name COLUMN (col2,col1);

 

Sometimes other queries (based on the same table) may reference only col2 in WHERE condition, then option 2 is more advaisable.