Our Goal is we need to minimize collect sats time:
we do use Combination for Full / Sample stats based on Row size:
- Smaple 50% for all table's statitics combination for tables > 5 Billion Rows
- FULL for all table's statitics combination for tables < 5 Billion Rows
While this still take long time and we came accross this idea, to minimize it
- we will not evaluate full or sample bu total row count, but for the number of values in each stats combination
if # of rows is 2 bilion, but # of values by stats combination (col1,col2) = 10billion
-- > the we use sampled stats for this column combination specifically
if # of rows is 10 bilion, but # of values by stats combination (col3,col4) = 2billion
-- > the we will not use sampled stats for this column combination specifically
Is logical to use this approach or we need to stck to total # of rows for the table
Did you read Carrie Ballinger's blogs on "Statistics Recommendation" for your release? 13/13.10/14?
For some really large tables there's no other way than sample stats.
You just have to be carefull when a column is correlated to a partitioning column, e.g. partition by date and another column year_month. Never collect sample stats on year_month without also including the date, because sampling just starts to read the first blocks and a partitioned table is sorted by partition number...