The biased number don't imply it's a really big table, so I assume that "5-6 hours" is for recollecting all stats on that table.
You should check if you can use SAMPLE STATS instead of full stats and recollect only specific columns.
What's your release 14 or 14.10+?
Do a SHOW STATS VALUES and keep the output.
Run COLLECT STATS USING SAMPLE 2 PERCENT on the column.
Do SHOW STATS VALUES again and check if it's too bad (caution, the number of rows per value must be multiplied by 50, because it's based on the 2 percent sample).
If it's not good enough, try increasing the sample to 5/10/whatever percent.
If it's still not ok, simply re-apply the old stats by running the output of the first SHOW STATS VALUES.
In most cases sample stats are ok, but check this: When sample statistics go wild
In TD14.10 the optimizer might ignore a USING SYSTEM SAMPLE, that's why I specified a percentage.
Perhaps you could clarify your next to last sentence: If a new event_dt(new records) is inserted in the table, it should be available in the system, without actually capturing the full stats .
The original post above shows a list of the bias values which should not be expected to show every value, nor does it need to in order to allow the optimizer to optimize queries. The baised values are just the most frequent ones in their part of the histogram.
If the newly added rows are a small percentage addition to a large table, it is not required to update the statistics, the optimizer will still make a plan using the older statistics and a concept called extrapolation.
The above statement appears to be suggesting that the statistics need to be current to allow the new values to be used and that is not the case. The optimizer does not use statistics to answer queries, only to advise the planning process. And the planning process can deal with mildly to moderately out of date stats.
Thanks Dieter! Collecting sample stats and multiplying the sampling percent with AMP numbers on the system willhelp.Is there any other way?Is it possible to use Object Usage Count(in TD 14.10) to get more accurate extrapolated values?
@Todd, sorry for the confusion.Actually I meant to say if new records for a date range ,say for 10 days are inserted, since the last stats collected, how do we capture those changes and help optimizer?The number of such records may vary from few thousands to millions.
I will check and conirm on this. However , from dbql when i was checking, i see many other applications also collect sample stats , but why issue coming only for one table and one column. But I might be wrong .. will see
I checked for this column on PROD, and its = 0 (Enabled; Default value).
Is its due to this? as many clumns are set at 100 and varied low numbers and set as Global default. Its hapenning nmormally only for one or few columns in the env , rest things are normal..
I think its because of this parameter .Like i mentioned , If you change the value of SysSampleOption to 2 , this feature will be disabled and all of your full stats will never be converted (downgraded) to sample stats automatically by system.Try this and let us know if it resolves your issue.
Just to be complete in this conversation...
Disabling this option will disable one of the features of AutoStats. The purpose of this feature is to significantly reduce the cost of stats where sample stats should be sufficient rather than collecting full stats. Disabling the option will disable the value for all the other cases that are working well.
Instead, use the USING NO SAMPLE option on the collect for the problematic stats collection (see the manual for the syntax, options). This will forcce full stats collection for the one case where sample stats are not resulting in a good plan rather than shutting off a chunk of the autostats functionality for all users/tables.
It is unfortunate that one plan gets worse with the sample stats. I think that Teradata support would like to hear about that case so that they can improve the algorithm for choosing sample stats.