Analytics
Enthusiast

## Randomied sample

I have a table with

account_number
first name
last name
present balance
outstanding balance
ASR

This table has lets say 2 million records.

i have to take a sample from this table for 1000 accounts.
but the requirment is that the avarage of the ASR of the base table must be very close
to the avarage of ASR of the sample data.

thanks a lot for the help
4 REPLIES
Enthusiast

## Re: Randomied sample

First you need to decide what you mean by "close to". For the example code I'll assume that close is within 1/2 a standard deviation of the average.
One way of doing it is:

SELECT * FROM big_table
WHERE asr < (SELECT AVG(asr)+STDDEV_SAMP(asr)/2 FROM big_table)
AND asr > (SELECT AVG(asr)-STDEV_SAMP(asr)/2 FROM big_table)
SAMPLE 0.1;

This will select a 10% of the rows that have an asr within 1/2 a standard deviation of the mean asr.
Of course you could just decide that (say) 5000 is close to the average and then replace the STDDEV_SAMP(asr)/2 with the number 5000 in the where clause. Your choice dependent upon how you want to "define close to the average".

Hope this helps

Paul

Enthusiast

## Re: Randomied sample

Thanks a lot paul.

but the problem i have at hand is by the query above i will be picking accounts with ASR that is near to the avarage ASR of the table.
But all i want is
i would like to table accounts from all different ranges of ASR but when i
calculate the avarage ASR to the sample it should be with in 10% of the avarage ASR of the base table.

thanks
Enthusiast

## Re: Randomied sample

Well if you treat the table as the population, then as the sample you draw from that population increases, the closer that the mean of the sample will be to the mean of the population.
So if you want the sample mean to be within 10 units of the population mean, you need to work out how big a sample will give you that accuracy.
Now assuming that the distribution of asr is approximately normal, we need to know the the standard deviation of asr. Once we have this we then need to decide how accurate do we want to be, eg do we want to be 95% sure that we have a sample that captures the population mean of asr by 10 units, or 99% sure?

Once we have decided on the accuracy level, we need to consult a normal table to obtain the critical values for our level of confidence, some of the more common ones are below:
90% - 1.645
95% - 1.96
99% - 2.576

Now armed with our knowledge of the standard deviation of asr (stddev_pop(asr)] we can use the following formulae to calculate the required sample size

n approx. ((critical value)*(stddev_pop(asr)/(margin of error))^2 and round up.

Plugging some numbers in for a worked example,
population is 2 000 000
margin of error 10
stdev(asr) 2 500
want 95% confidence.

so
(1.96*(2 500/10))^2

(1.96*250)^2
(490)^2

240 100

so with a sample of 240 100 we can be 95% sure we have captured the mean of the asr within 10 units.

Paul

Enthusiast

## Re: Randomied sample

Paul,

Thanks a lot.
I will have to try it.