Optimizing Stats Collection

Database
Enthusiast

Optimizing Stats Collection

Hi,

I have a huge table.

I want to collect stats on a few important date fields in the table, which are used by business for reporting purpose.

Even collecting full stats on these columns is quite resource intensive and taking 5-6 hours.

Is there any way I could capture the difference of the columns changes , and apply the delta , without collecting full stats.

I am using the following statement to see the  stats values:

SHOW STATISTICS VALUES COLUMN (event_dt) ON DB_Name.TableName;

A part of the output:

 /** Biased: Value, Frequency **/ 

 /*   1 */   DATE '2015-01-10', 20960, 

 /*   2 */   DATE '2015-01-15', 91234, 

 /*   3 */   DATE '2015-01-20', 89754, 

 /*   4 */   DATE '2015-02-15', 10345, 

 /*   5 */   DATE '2015-02-25', 65432, 

 /** HistoryOfSummaryInfo **/ 

If a new event_dt(new records) is inserted in the table, it should be available in the system, without actually capturing the full stats .

I would like to capture  the delta and apply on the target table, which will not consume so much resources.

Any help in this regard is highly appreciated!

9 REPLIES
Junior Contributor

Re: Optimizing Stats Collection

The biased number don't imply it's a really big table, so I assume that "5-6 hours" is for recollecting all stats on that table.

You should check if you can use SAMPLE STATS instead of full stats and recollect only specific columns.

What's your release 14 or 14.10+?

Do a SHOW STATS VALUES and keep the output.

Run COLLECT STATS USING SAMPLE 2 PERCENT on the column.

Do SHOW STATS VALUES again and check if it's too bad (caution, the number of rows per value must be multiplied by 50, because it's based on the 2 percent sample).

If it's not good enough, try increasing the sample to 5/10/whatever percent.

If it's still not ok, simply re-apply the old stats by running the output of the first SHOW STATS VALUES.

In most cases sample stats are ok, but check this: When sample statistics go wild

In TD14.10 the optimizer might ignore a USING SYSTEM SAMPLE, that's why I specified a percentage.

Teradata Employee

Re: Optimizing Stats Collection

Perhaps you could clarify your next to last sentence: If a new event_dt(new records) is inserted in the table, it should be available in the system, without actually capturing the full stats .

The original post above shows a list of the bias values which should not be expected to show every value, nor does it need to in order to allow the optimizer to optimize queries. The baised values are just the most frequent ones in their part of the histogram.

If the newly added rows are a small percentage addition to a large table, it is not required to update the statistics, the optimizer will still make a plan using the older statistics and a concept called extrapolation.

The above statement appears to be suggesting that the statistics need to be current to allow the new values to be used and that is not the case. The optimizer does not use statistics to answer queries, only to advise the planning process. And the planning process can deal with mildly to moderately out of date stats.

Enthusiast

Re: Optimizing Stats Collection

Thanks Dieter! Collecting sample stats and multiplying the sampling percent with AMP numbers on the system willhelp.Is there any other way?Is it possible to use Object Usage Count(in TD 14.10) to get more accurate extrapolated values?

@Todd, sorry for the confusion.Actually I meant to say if new records for a date range ,say for 10 days are inserted, since the last stats collected, how do we capture those changes and help optimizer?The number of such records may vary from few thousands to millions.

Enthusiast

Re: Optimizing Stats Collection

Hi Guys,

I have  a situation , in my Teradata env,  there is a column on a table which some time does not gets refreshed after stats collection resulting into spool out  sometimes. when we checked , the unique demographic values for that col  through help stats does not match the select distinct(col) from db.tablename, hence the stats does not gets refreshed even after collecting  . digginng further , we checked the dbc.statsv for that column and Samplesizepcntg  value for that col was 10 instead of 100, which we changed it to to 100 and refreshed the stats again, which worked.. I was having understanding that if we set this value manually,it should not change, but after three weeks again i encountered the similar issue and the value now was again to 10.98 instead of 100.

Looking closely, the sample signature column is mentioned as USP0100.00 instead of Global default , can this be cause oif something  which is reverting the value to some lower number instead of even setting it to 100 manually?

I also checked the DBQL logs to see whats happening on the collect stats on that table, if some process defining statistics with the percent level and overriding the system setting as suggested by one of my seniors, but still could get any clue being run through application side excepts those statements run by me.  Any body any clue..

Regards,

Junior Supporter

Re: Optimizing Stats Collection

Amit,

Check the value of SysSampleOption  (DBS Control Record - Optimizer Statistics Fields)

I believe the issue is due to DBS control value SysSampleOption = 0 (Enabled; Default value).

Value 0 and 1 have same behaviour.

If you change this value to 2 then system will stop automatic conversion of full stats to sample stats.


Abhishek Jadhav
Enthusiast

Re: Optimizing Stats Collection

Thanks Abhishek,

I will check and conirm on this. However , from dbql when i was checking, i see many other applications also collect sample stats , but why issue coming only for one table and one column. But I might be wrong .. will see 

Enthusiast

Re: Optimizing Stats Collection

Abhishek,

I checked for this column on PROD, and its  = 0 (Enabled; Default value).

Is its due to this? as many clumns are set at 100 and varied low numbers and set as Global default. Its hapenning nmormally only for one or few columns in the env , rest things are normal..

Junior Supporter

Re: Optimizing Stats Collection

Amit,

I think its because of this parameter .Like i mentioned , If you change the value of SysSampleOption to 2 , this feature will be disabled and all of your full stats will never be converted (downgraded) to sample stats automatically by system.Try this and let us know if it resolves your issue.


Abhishek Jadhav
Teradata Employee

Re: Optimizing Stats Collection

Just to be complete in this conversation...

Disabling this option will disable one of the features of AutoStats. The purpose of this feature is to significantly reduce the cost of stats where sample stats should be sufficient rather than collecting full stats. Disabling the option will disable the value for all the other cases that are working well.

Instead, use the USING NO SAMPLE option on the collect for the problematic stats collection (see the manual for the syntax, options). This will forcce full stats collection for the one case where sample stats are not resulting in a good plan rather than shutting off a chunk of the autostats functionality for all users/tables.

It is unfortunate that one plan gets worse with the sample stats. I think that Teradata support would like to hear about that case so that they can improve the algorithm for choosing sample stats.