I have to validate the distribution of values in a specific field to conform the expectation. Say for Example, I have a table with the column customer status which has 3 values new buyer, existing buyer and new to department. How do I validate the customer status field using mean and standard deviation to confirm the distribution of data? If I have 0 existing buyer then the chance is that the data is wrong.
The short answer is, you probably can't. Mean and Standard Deviation both apply to numeric measures - and Standard Deviation is generally useful only for a range of values (not just 3 discrete values).
For example: if you were dividing customers into (a fairly large number of) classes, you could compute mean and standard deviation of the count in each class, to test how well the class division compares to the expectation.
Thank you Fred. That helps!
For the example you have given, I can compute the mean and standard deviation for each class and then how do I test to see the distribution is expected? Can you please explain?
What is the approximate minimum range of values for which we could use this mean and standard deviation when numeric measures of 3 discrete values cannot be done?