COLLECT STATISTICS variations

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Highlighted
Enthusiast

COLLECT STATISTICS variations

How are the following COLLECT STATISTICS statements processed?  Which is more efficient?

 

"COLLECT STATISTICS ON TABLEA;"

VS.

"COLLECT STATISTICS ON TABLEA COLUMN (COLUMNA);

 COLLECT STATISTICS ON TABLEA COLUMN (COLUMNB);

 COLLECT STATISTICS ON TABLEA COLUMN (COLUMNC):"

VS.

"COLLECT STATISTICS

     COLUMN (COLUMNA),

     COLUMN (COLUMNB),

     COLUMN (COLUMNC)

        ON TABLEA;"

Tags (1)

Accepted Solutions
Senior Apprentice

Re: COLLECT STATISTICS variations

Hi,

 

I wouldn't describe it as parallel (although it is a performance improvement).

 

What may happen with the third example is that the optimizer does a single scan of the data, aggregating values into a spool file. This spool file is then read again to build the final, individual stats that you have requested. The explain plan will show this - look for how many steps access the base table. This optimization is often referred to as 'statistics roll up'.

 

As with a lot of things in Teradata, whether or not this happens is an optimizer decision.

 

With your second example, it will not happen.

With your third example, it may happen.

 

Once you have stats on a table, the first example will (usually) refresh using the same approach as done originally.

So assuming that initially you ran example 3 and the optimizer did a single pass of the data (i.e. you got 'roll up').

If you then ran your example 1 code, you'd expect to also get 'roll up'.

 

Cheers,

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
1 ACCEPTED SOLUTION
4 REPLIES
Senior Apprentice

Re: COLLECT STATISTICS variations

Hi,

 

The first example is only used for refreshing statistics. If there are no stats on the table then this will do nothing (it may fail, I can't remember).

 

The second one will collect or refresh stats and will do three scans of the data.

 

The third one will collect or refresh stats and may do one, two or three scans of the data. This last part is an optimizer decision. This example is likely to be the most efficient - if fewer than three scans of the data are required.

 

HTH

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: COLLECT STATISTICS variations

Thanks Dave.

 

So it's possible, in the third example, that different stats can be collected in parallel?

 

I understand the limitations on the first example... that stats must already be present.  But will that statement collect in parallel?

Senior Apprentice

Re: COLLECT STATISTICS variations

Hi,

 

I wouldn't describe it as parallel (although it is a performance improvement).

 

What may happen with the third example is that the optimizer does a single scan of the data, aggregating values into a spool file. This spool file is then read again to build the final, individual stats that you have requested. The explain plan will show this - look for how many steps access the base table. This optimization is often referred to as 'statistics roll up'.

 

As with a lot of things in Teradata, whether or not this happens is an optimizer decision.

 

With your second example, it will not happen.

With your third example, it may happen.

 

Once you have stats on a table, the first example will (usually) refresh using the same approach as done originally.

So assuming that initially you ran example 3 and the optimizer did a single pass of the data (i.e. you got 'roll up').

If you then ran your example 1 code, you'd expect to also get 'roll up'.

 

Cheers,

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: COLLECT STATISTICS variations

Thank you Dave!