collect statistics using system sample - always 100%

Database

collect statistics using system sample - always 100%

Hello,

 

I have trouble achieving any other sample size than 100% writing 

COLLECT STATS USING SYSTEM SAMPLE ON ...

DBS control is set to:

4.  SysSampleOption             = 0

dbc.constantdefs is set to:

13115; StatsSysSampleOption; I; 0

as per manuals and the comment in constantdefs 0 should mean it defers to DBS control where 0 is enabled / default value

Nevertheless all tables under various scenarios never downsample, it always collects stats on full table and unless I manually specify NO THRESHOLD it does not even insert the sample value into statstable (100), its just null

Could someone please tell me if I am supposed to either set some other value in DBS, or rather disable this setting in DBS and change the value in constantdefs, or something else entirely?

 TD version 15.10.07.07

Thank you


Accepted Solutions
Teradata Employee

Re: collect statistics using system sample - always 100%

If you specify (or default) both SYSTEM SAMPLE and SYSTEM THRESHOLD, then THRESHOLD takes precedence. It's cheaper to skip stats collection entirely than to sample.

 

With SYSTEM SAMPLE and a user-specified THRESHOLD (including NO THRESHOLD), you first need multiple stats collections over a period of time (30 days). Once you have that history, if the database determines that SYSTEM THRESHOLD would have skipped stats collection (but the user THRESHOLD did not), then the database will evaluate whether detailed stats interval information is regularly being used (which argues for continuing to collect full stats) and if not, how likely is it that sampling & extrapolation would provide sufficiently accurate estimates.

1 ACCEPTED SOLUTION
3 REPLIES
Teradata Employee

Re: collect statistics using system sample - always 100%

If you specify (or default) both SYSTEM SAMPLE and SYSTEM THRESHOLD, then THRESHOLD takes precedence. It's cheaper to skip stats collection entirely than to sample.

 

With SYSTEM SAMPLE and a user-specified THRESHOLD (including NO THRESHOLD), you first need multiple stats collections over a period of time (30 days). Once you have that history, if the database determines that SYSTEM THRESHOLD would have skipped stats collection (but the user THRESHOLD did not), then the database will evaluate whether detailed stats interval information is regularly being used (which argues for continuing to collect full stats) and if not, how likely is it that sampling & extrapolation would provide sufficiently accurate estimates.

Re: collect statistics using system sample - always 100%

Hello Fred, thank you very much for explanation

I was not able to find out about system threshold and system sample interaction, as you described here, in the manuals.

This also explains why is there null in statstable when only system sample is explicit and system threshold is implicit.

 

Please correct me if I am wrong about that

Since both system options are enabled, if I would just specify threshold explicitly (in a way that is different to system setting), then I would get both funcionalities? I mean, user threshold does NOT skip stats collection so implicit system sample considers sampling, after 30 days.

My thinking is, that for big tables is still inefficient to collect stats on all rows or none, when sample would be enough (agreements table and so on which are pretty stable in data demography)

I am looking for system wide solution, as while I can set sample on my own, I would need all other developers to manage sampled stats while they do not know how to, e.g. system sample (there are many, many developers, changing all the time)

Also is that 30 days period hardcoded? I would like to try it on testing rack, without changing dates on system.

 

Thank you

Highlighted
Teradata Employee

Re: collect statistics using system sample - always 100%

To be clear, it's only when both THRESHOLD DAYS and THRESHOLD PERCENT are set to "SYSTEM" that SYSTEM SAMPLE is effectively ignored. Any THRESHOLD is always checked first, before sampling, of course. 

 

Note that the SYSTEM THRESHOLD DAYS is "unlimited", so one of the simplest ways to "switch on" SYSTEM SAMPLE is just to specify a DAYS threshold. And by default the system will rely on DBQL UseCount information to determine PERCENT change (so you will want to enable USECOUNT logging). Typically you need at least four full collections to establish a "reliable pattern" and see stats being skipped or sampled. I'm not aware of any way to control the history requirement for SYSTEM SAMPLE. 

 

The interaction between SYSTEM THRESHOLD and SYSTEM SAMPLE is not mentioned in the SQL Request and Transaction Processing manual, which seems to have the most detail regarding behavior of SYSTEM SAMPLE. There is an "Orange Book" for TD14.10 Statistics Enhancements that might be available from the Teradata Support portal with a Teradata At Your Service account, or from your TD account team; I'm not sure.