Collect Stats n Refresh Stats

UDA
Enthusiast

Collect Stats n Refresh Stats

What are factors to decide how much time it should take for "collect statistic" and "refresh statistic"?
My question is related to time it takes for "collect statistic" and "refresh statistic"?
15 REPLIES
Enthusiast

Re: Collect Stats n Refresh Stats

It is how frequently your data in the tables is added/modified/deleted. If the data changes very frequently, you might need to collect /refresh statistics often to get best performance for your queries.Ofcourse,remember that collect stats is resource intensive !
Enthusiast

Re: Collect Stats n Refresh Stats

WHEN EVER THERE IS EXTENSIVE INSERTs, DELETEs, UPDATEs, OR ANY OTHER MAINTANENCE TO ANY COLUMNS WHICH IS CONTAINING STATISTICS.WE SHOULD REFRESH THE STATS.
THE CHANGE OF DATA SHOULD OF 5 TO 10 %

Enthusiast

Re: Collect Stats n Refresh Stats

Another parameter to check is the confidence level in the explain plan.

Vinay
Enthusiast

Re: Collect Stats n Refresh Stats

one more hint. use "diagnostic helpstats" to get recommendations on collect stats on particular columns. This will be displayed at the end of explain text
Enthusiast

Re: Collect Stats n Refresh Stats

Hi all,

Could u pls tell me something more about "diagnostic helpstats" .....

Thanks,
abc
Enthusiast

Re: Collect Stats n Refresh Stats

Hi all,

Is the same procedure followed to do the collect stats when it is done for the first time and when done for any successive times on the same table (i,e when 5% - 10% of data is being altered)

Thanks,
abc
Enthusiast

Re: Collect Stats n Refresh Stats

diagnostic helpstats is used to understand the optimizer recommendations of what stats it thinks might be useful in making a (probably) better query execution plan.

you can do it by typing the following command in sql assistant and doing the explain on your query after wards.

diagnostic helpstats on for session;

explain
select a, b, c
from t1, t2
where t1.a = t2.b
....

When you recollect stats on a table, teradata internally does all the same operations it did the first time you collected stats on it. so depending on whether it was a full stats or a sampled stats it would either scan the full table (exceptions are if there are index subtables which would give a faster results) or scan a percentage of the table to collect stats information (this is across the whole table and not just the additional 5% or 10% of the records that changed)
Enthusiast

Re: Collect Stats n Refresh Stats

Hi all,

Thanks for that...

Can Diagnostic statistics be activated at a still higher level? so that when an explain is done we get the optimiser recommendations?

Also in most of the explain statements we get something like.....

We do an all-AMPs RETRIEVE step from "CS".employee1 by way of an
all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs.

what is meant by .....
no residual conditions into Spool 1

and
(group_amps)???

In another explain statement...... which is partially listed out here.....

The size of Spool 2 is estimated with high confidence to be 54,555,839 rows.
The estimated time for this step is 8.41 seconds.

The size of Spool 3 is estimated with high confidence to be 1,852,824,064 rows.
The estimated time for this step is 7 minutes and 56 seconds.

We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to Spool 3 (Last Use) by way of a RowHash match scan.
Spool 2 and Spool 3 are joined using a merge join, with a join condition of
((T1.col1 > t2.col1) and ((t1.col2 < tab2.col2) and (tab1.col3 = tab2.col3)))
The result goes into Spool (group_amps), which is built locally on the AMPs.
The result spool file will not be cached in memory.
The size of Spool 1 is estimated with no confidence to be 3,868,443,675 rows.
The estimated time for this step is 3 minutes and 36 seconds.

Note : collect stats is done on all the columns used in the join condition then why in the second last statement above states as NO CONFIDENCE ???

Kindly explain
Thanks,
abc

Enthusiast

Re: Collect Stats n Refresh Stats

Q. what is meant by .....
no residual conditions into Spool 1

Ans: All applicable conditions are applied.

Q. and
(group_amps)???

Ans: A subset of AMPs will be used instead of All AMPs.

Q. Note : collect stats is done on all the columns used in the join condition then why in the second last statement above states as NO CONFIDENCE ???

Ans: Your join has inequality conditions. ( I am not 100% sure of this answer).