Database

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-04-2006
01:12 PM

10-04-2006
01:12 PM

ALL,

We have a job that collect statistics on a huge table and it takes aroung 6 hours.

I heard that we can take collect statistics on a few amps instead of all the data, How can we do this? and is this feature available in V2R5.

I'd appreciate if someone can provide some info on collecting statistics on either a few amps or on a sample data.

We have a job that collect statistics on a huge table and it takes aroung 6 hours.

I heard that we can take collect statistics on a few amps instead of all the data, How can we do this? and is this feature available in V2R5.

I'd appreciate if someone can provide some info on collecting statistics on either a few amps or on a sample data.

3 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-05-2006
01:09 AM

10-05-2006
01:09 AM

I beleive you are concerned about sampled statistics.

Random Amp samples can be collected on tables. This is less aggressive than collecting stats across entire table. But, the accuracy of these stats depends upon the even distrubution of data. If the data is skewed , you might hit an over or under estimation problems.

The logic is simple. As such, a particular AMP will be picked for gathering samples(restrictions apply) .These stats are used for estimating costs.

The following is the sample syntax for collecting Random amp samples.

COLLECT STATISTICS USING SAMPLE INDEX (EmpNo, Name) ON Employee

For more information please RTFM ;-)

Random Amp samples can be collected on tables. This is less aggressive than collecting stats across entire table. But, the accuracy of these stats depends upon the even distrubution of data. If the data is skewed , you might hit an over or under estimation problems.

The logic is simple. As such, a particular AMP will be picked for gathering samples(restrictions apply) .These stats are used for estimating costs.

The following is the sample syntax for collecting Random amp samples.

COLLECT STATISTICS USING SAMPLE INDEX (EmpNo, Name) ON Employee

For more information please RTFM ;-)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-05-2006
01:43 PM

10-05-2006
01:43 PM

Leo thanks for the information.

I tried it out and it worked fine, But there is a catch. When we use sample the stats are collected on one AMP and then the value is multiplied by the number of amps.

Assume I have 10 departments, and i am collecting stats on the deptno field in employee table

COLLECT STATISTICS column (deptno) ON Employee;

says there are 10 unique_values for deptno

COLLECT STATISTICS USING SAMPLE INDEX (deptno) ON Employee;

says there are 100 unique_values for deptno (Assuming I have 10 amps).

That is a huge difference.

When we collect stats on employee number the sample works fine, because it is more unique. The concept of collecting stats on one amp and multiplying with number of amps will provide close results in this case.

I am not sure if there is a way to sample on 50% of the amps.

By doing this we collect stats on half the number of amps and multiply the number by 2

(Taking our deptno example this might say there are 20 unique_values for deptno.)

which is not good but acceptable.

-Anyways Thank you for your reply and it helped us.

I tried it out and it worked fine, But there is a catch. When we use sample the stats are collected on one AMP and then the value is multiplied by the number of amps.

Assume I have 10 departments, and i am collecting stats on the deptno field in employee table

COLLECT STATISTICS column (deptno) ON Employee;

says there are 10 unique_values for deptno

COLLECT STATISTICS USING SAMPLE INDEX (deptno) ON Employee;

says there are 100 unique_values for deptno (Assuming I have 10 amps).

That is a huge difference.

When we collect stats on employee number the sample works fine, because it is more unique. The concept of collecting stats on one amp and multiplying with number of amps will provide close results in this case.

I am not sure if there is a way to sample on 50% of the amps.

By doing this we collect stats on half the number of amps and multiply the number by 2

(Taking our deptno example this might say there are 20 unique_values for deptno.)

which is not good but acceptable.

-Anyways Thank you for your reply and it helped us.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-26-2006
06:54 AM

10-26-2006
06:54 AM

Hi

You're correct. Try collecting stats only on the columns that are referenced in WHERE clauses or on your indexes.

You're correct. Try collecting stats only on the columns that are referenced in WHERE clauses or on your indexes.