It is how frequently your data in the tables is added/modified/deleted. If the data changes very frequently, you might need to collect /refresh statistics often to get best performance for your queries.Ofcourse,remember that collect stats is resource intensive !
diagnostic helpstats is used to understand the optimizer recommendations of what stats it thinks might be useful in making a (probably) better query execution plan.
you can do it by typing the following command in sql assistant and doing the explain on your query after wards.
diagnostic helpstats on for session;
explain select a, b, c from t1, t2 where t1.a = t2.b ....
When you recollect stats on a table, teradata internally does all the same operations it did the first time you collected stats on it. so depending on whether it was a full stats or a sampled stats it would either scan the full table (exceptions are if there are index subtables which would give a faster results) or scan a percentage of the table to collect stats information (this is across the whole table and not just the additional 5% or 10% of the records that changed)
Can Diagnostic statistics be activated at a still higher level? so that when an explain is done we get the optimiser recommendations?
Also in most of the explain statements we get something like.....
We do an all-AMPs RETRIEVE step from "CS".employee1 by way of an all-rows scan with no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs.
what is meant by ..... no residual conditions into Spool 1
In another explain statement...... which is partially listed out here.....
The size of Spool 2 is estimated with high confidence to be 54,555,839 rows. The estimated time for this step is 8.41 seconds.
The size of Spool 3 is estimated with high confidence to be 1,852,824,064 rows. The estimated time for this step is 7 minutes and 56 seconds.
We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to Spool 3 (Last Use) by way of a RowHash match scan. Spool 2 and Spool 3 are joined using a merge join, with a join condition of ((T1.col1 > t2.col1) and ((t1.col2 < tab2.col2) and (tab1.col3 = tab2.col3))) The result goes into Spool (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 3,868,443,675 rows. The estimated time for this step is 3 minutes and 36 seconds.
Note : collect stats is done on all the columns used in the join condition then why in the second last statement above states as NO CONFIDENCE ???