Analytics

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-11-2006
11:22 AM

04-11-2006
11:22 AM

Randomied sample

Can someone please help

I have a table with

account_number

first name

last name

present balance

outstanding balance

ASR

This table has lets say 2 million records.

i have to take a sample from this table for 1000 accounts.

but the requirment is that the avarage of the ASR of the base table must be very close

to the avarage of ASR of the sample data.

thanks a lot for the help

I have a table with

account_number

first name

last name

present balance

outstanding balance

ASR

This table has lets say 2 million records.

i have to take a sample from this table for 1000 accounts.

but the requirment is that the avarage of the ASR of the base table must be very close

to the avarage of ASR of the sample data.

thanks a lot for the help

4 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-20-2006
05:21 AM

04-20-2006
05:21 AM

Re: Randomied sample

First you need to decide what you mean by "close to". For the example code I'll assume that close is within 1/2 a standard deviation of the average.

One way of doing it is:

SELECT * FROM big_table

WHERE asr < (SELECT AVG(asr)+STDDEV_SAMP(asr)/2 FROM big_table)

AND asr > (SELECT AVG(asr)-STDEV_SAMP(asr)/2 FROM big_table)

SAMPLE 0.1;

This will select a 10% of the rows that have an asr within 1/2 a standard deviation of the mean asr.

Of course you could just decide that (say) 5000 is close to the average and then replace the STDDEV_SAMP(asr)/2 with the number 5000 in the where clause. Your choice dependent upon how you want to "define close to the average".

Hope this helps

Paul

One way of doing it is:

SELECT * FROM big_table

WHERE asr < (SELECT AVG(asr)+STDDEV_SAMP(asr)/2 FROM big_table)

AND asr > (SELECT AVG(asr)-STDEV_SAMP(asr)/2 FROM big_table)

SAMPLE 0.1;

This will select a 10% of the rows that have an asr within 1/2 a standard deviation of the mean asr.

Of course you could just decide that (say) 5000 is close to the average and then replace the STDDEV_SAMP(asr)/2 with the number 5000 in the where clause. Your choice dependent upon how you want to "define close to the average".

Hope this helps

Paul

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-20-2006
08:25 AM

04-20-2006
08:25 AM

Re: Randomied sample

Thanks a lot paul.

but the problem i have at hand is by the query above i will be picking accounts with ASR that is near to the avarage ASR of the table.

But all i want is

i would like to table accounts from all different ranges of ASR but when i

calculate the avarage ASR to the sample it should be with in 10% of the avarage ASR of the base table.

thanks

but the problem i have at hand is by the query above i will be picking accounts with ASR that is near to the avarage ASR of the table.

But all i want is

i would like to table accounts from all different ranges of ASR but when i

calculate the avarage ASR to the sample it should be with in 10% of the avarage ASR of the base table.

thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-20-2006
09:34 PM

04-20-2006
09:34 PM

Re: Randomied sample

Well if you treat the table as the population, then as the sample you draw from that population increases, the closer that the mean of the sample will be to the mean of the population.

So if you want the sample mean to be within 10 units of the population mean, you need to work out how big a sample will give you that accuracy.

Now assuming that the distribution of asr is approximately normal, we need to know the the standard deviation of asr. Once we have this we then need to decide how accurate do we want to be, eg do we want to be 95% sure that we have a sample that captures the population mean of asr by 10 units, or 99% sure?

Once we have decided on the accuracy level, we need to consult a normal table to obtain the critical values for our level of confidence, some of the more common ones are below:

90% - 1.645

95% - 1.96

99% - 2.576

Now armed with our knowledge of the standard deviation of asr (stddev_pop(asr)] we can use the following formulae to calculate the required sample size

n approx. ((critical value)*(stddev_pop(asr)/(margin of error))^2 and round up.

Plugging some numbers in for a worked example,

population is 2 000 000

margin of error 10

stdev(asr) 2 500

want 95% confidence.

so

(1.96*(2 500/10))^2

(1.96*250)^2

(490)^2

240 100

so with a sample of 240 100 we can be 95% sure we have captured the mean of the asr within 10 units.

Paul

So if you want the sample mean to be within 10 units of the population mean, you need to work out how big a sample will give you that accuracy.

Now assuming that the distribution of asr is approximately normal, we need to know the the standard deviation of asr. Once we have this we then need to decide how accurate do we want to be, eg do we want to be 95% sure that we have a sample that captures the population mean of asr by 10 units, or 99% sure?

Once we have decided on the accuracy level, we need to consult a normal table to obtain the critical values for our level of confidence, some of the more common ones are below:

90% - 1.645

95% - 1.96

99% - 2.576

Now armed with our knowledge of the standard deviation of asr (stddev_pop(asr)] we can use the following formulae to calculate the required sample size

n approx. ((critical value)*(stddev_pop(asr)/(margin of error))^2 and round up.

Plugging some numbers in for a worked example,

population is 2 000 000

margin of error 10

stdev(asr) 2 500

want 95% confidence.

so

(1.96*(2 500/10))^2

(1.96*250)^2

(490)^2

240 100

so with a sample of 240 100 we can be 95% sure we have captured the mean of the asr within 10 units.

Paul

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-21-2006
08:58 AM

04-21-2006
08:58 AM

Re: Randomied sample

Paul,

Thanks a lot.

I will have to try it.

Your logic should work ...

thanks

teja

Thanks a lot.

I will have to try it.

Your logic should work ...

thanks

teja