Database
Enthusiast

## Collect statistics question

ALL,
We have a job that collect statistics on a huge table and it takes aroung 6 hours.
I heard that we can take collect statistics on a few amps instead of all the data, How can we do this? and is this feature available in V2R5.
I'd appreciate if someone can provide some info on collecting statistics on either a few amps or on a sample data.
3 REPLIES
Enthusiast

## Re: Collect statistics question

I beleive you are concerned about sampled statistics.

Random Amp samples can be collected on tables. This is less aggressive than collecting stats across entire table. But, the accuracy of these stats depends upon the even distrubution of data. If the data is skewed , you might hit an over or under estimation problems.

The logic is simple. As such, a particular AMP will be picked for gathering samples(restrictions apply) .These stats are used for estimating costs.

The following is the sample syntax for collecting Random amp samples.
COLLECT STATISTICS USING SAMPLE INDEX (EmpNo, Name) ON Employee

Enthusiast

## Re: Collect statistics question

Leo thanks for the information.
I tried it out and it worked fine, But there is a catch. When we use sample the stats are collected on one AMP and then the value is multiplied by the number of amps.
Assume I have 10 departments, and i am collecting stats on the deptno field in employee table
COLLECT STATISTICS column (deptno) ON Employee;
says there are 10 unique_values for deptno

COLLECT STATISTICS USING SAMPLE INDEX (deptno) ON Employee;
says there are 100 unique_values for deptno (Assuming I have 10 amps).

That is a huge difference.

When we collect stats on employee number the sample works fine, because it is more unique. The concept of collecting stats on one amp and multiplying with number of amps will provide close results in this case.

I am not sure if there is a way to sample on 50% of the amps.
By doing this we collect stats on half the number of amps and multiply the number by 2
(Taking our deptno example this might say there are 20 unique_values for deptno.)
which is not good but acceptable.