Thanks in advance for your time.
I would like your thoughts on collecting stats.
I have a very big table , around 24 B rows .The fields are all accurate (in length), no varchar / char , mainly integer. All nullable columns have compression .We use the new way of collecting stats on TD 14 , like a single transaction , but since the majority of the stats are single stats , we do not gain anything from forcing TD to use the already defined spool of multicolumn.If we will add more values to MVC , will it help ?
Any other suggestion to reduce the time of collection (CPU and IO)?
MVC has no direct effect on collecting stats (other than the table might be smaller, thus less data to read).
But I would try to do mainly SAMPLE stats, in my experience it can be done on almost all columns. In TD14 the optimizer might decide to start with full stats and then dynamically reduce the sample size, so for the first collection it will be no reduction. And you must not drop/recreate the stats as it will reset that procedure.
Hello Dieter ,
Thanks for the reply ,i made a simple test of 300 M rows table. I just added the 255 max values of MVC per column and i noticed 30% decrease of the Total IO , no CPU / Spool gain.
As you mentioned , i agree with you , sample stats is the only solution .
I will start reducing the sample % by 10 % every time , in order to check every time the histogram and the cpu/io consuption in order to get to the correct number . (and then make some tests on queries which these tables are participating to check if the plan have changed or not ). Is this a good approach?
Furthermore , one last question , if our partition column is part of the PI , then on TD 14 should we collect full stats on this column or sample one?
Thank you once more for your time.
Did you check the tablesize after adding MVC? If resource usage went down 30% i would expect a way smaller table.
I made a mistake, TD14.10 automatically tries to find the best sample size, before you have to do it manually. You can do it as you described, just keep the output of previous SHOW STATS VALUES in case of the sample size gets to low, you can simple re-apply the previous stats using that COLLECT STATS VALUES.
And instead of starting with a high sample size you can do it the other way round:
Caution: for sample stats the HighModeFreq, NumOfRows and Frequency values are based on the sample, i.e. you have to adjust them for 100%, e.g. SamplePercent = 2.00 and Frequency = 500, actual Frequency = 500 / 0.02 = 25,000
And in most cases you don't have to check Explains as long as full and sample stats show similar results.
Thanks once more .
The table had already on some columns MVC , so probably the change was not big.
I will follow your plan & the hint about HighModeFreq .
So , on 14.10 they have 'fix' this issue , i was hoping to go to 14.10 due to the feature partial duplication partial Redistribution & re-check the plan , but now i understund than we have to update as soon as possible.
Thank you once more.
Can i ask oone more thing about sample stats.
I can identify directly the columns where i can have sample on three approaches ,
1) from comparing the uniquevaluecount to rowcount from dbc.stats , if they are more than 95%
2)And then the columns which have uniquevaluecount = 1 (mean same value to all records.)
3)All the columns which the column nullcount is like rowcount
Is the second and third approach correct?
What could be the reason for a collect stats statement failure because of spool space ?
Also on execution on Collect stats statement what Teradata DB does internally on Vprocs & Vdisks ?
Please try to post different questions to new posts.
Out of space means , that for a spesific time , the sum of spool assigned to a user of all the concurrent sessions was more than the assigned. If it is only only query , ask DBA's to provide you more spool.
If there were more concurrent queries , run it alone.