Advantage of not Collecting the stats


Advantage of not Collecting the stats

Hi All,

I want some info on stats collection.
I am aware about below fact :

1) how stats are collected and how stats helps optimizer in chossing better paln.
2) what are the interval concept in stats.
3) At frequency it should be refreshed.

But I am working on a automation process in which I am trying to calculate the the % data change in table.
If this is more then 10% then only I want to collect stats.

Using this I can give more windows to my loads and my CPU will be free for other activities .
Other then these what else benefits can I have by not refreshing my table stats if % data change is not more then 10%.

A little weird question But please reply as per your understanding.


Re: Advantage of not Collecting the stats

As I understand the stats collection provides the rows per value for the intervals in the table as well as the min and max values of the column. This helps the optimizer determine the cardinality that results from joining to tables together. Without the stats the optimizer must rely on random amp sampling to obtain these values. This can result in a less than optimal query plan.

Regarding the frequency of stats collection I tend to use the growth in the table size as well as the age of the statistics. Right now I compare the last access date and the date of the oldest stats collected on the table to refresh the stats on the table.

Hope this helps.

Re: Advantage of not Collecting the stats

Yes I am completely agree what u quoted.

But lets have a example of big big EDW where we are having a thousands of objects and billions of data, which is continuously hit by multiple user concurrently.

In such environment if I collect stats on scheduled basis without checking the data growth of that object then it will be highly resource consuming process.

And if my data growth is less than 10% then there is no point to collect stats.

So in EDW I want to make it in such a way so that if data growth is more then 10% then only TD collect stats otherwise leave it.

So other then resource saving is there any other advantage.

I hope I explained my scenario clearly.

Re: Advantage of not Collecting the stats


if its a mid sized table then u got to collect stats completely . If its a BIG BIG EdW then for those tables in GB's( example 50 GB Table) which u think data has changed more than 10% its better to go for Random Amp Sampling. If u try to collect stats on those big tables it would definitely be time as well as CPU consuming.

Since u r automating the process u can do this.
1. u should know when ur ETL starts or ETL ends. Based on the NON ETL window time u can give 2 HRS to collect stats on all the tables which u think is necessary.
2. Shell script should be in such a way that if collect stats over shoots the 2 hour window time the job should abort.


Re: Advantage of not Collecting the stats

For what it is worth, in my current environment I am only running a collect stats processing once a week on the weekends. Sure, there are environments where you may not have such a luxury. And as teradata_dba mentioned, on extremely large tables you can rely on either random amp sampling or consider using a sample percentage of the overall table when collecting stats. Using a sample on your stats collection will reduce the overhead associated.