I have a uniformly collected sample of 10000 data points from a population of about 200000. I'd like to find out what the distribution of the population is. How can I do this rigourously?
Determining the distribution of a population from a sample
2
$\begingroup$
statistics
-
0The question as it is posed is not clear, what do you mean by "distribution of population"? – 2011-04-07
-
0Ah, sorry stats isn't my first language, so please excuse me. The elements of the population may be, say, normally distributed, or uniformly distributed etc. I would like to asertain if there is a well known distribution that models the distribution of the points in my population. Is there a more succinct way to articulate this in statsy lingo? – 2011-04-07
-
0are these "elements" numbers on a real number line? – 2011-04-07
-
0Yes, they are natural numbers. – 2011-04-07
-
0typically analysis is done the other way around, where you know the distribution. This is an interesting question! – 2011-04-07
-
0Can you sample it for 1 time? Or you can have a lot of independent samples from this data set? – 2011-04-07
-
0I'm at liberty to make as many samples of size < 10000 as I like. Although I haven't proven it, I suspect that my samples are independent. – 2011-04-07
-
0Yes - I do not know if you can prove it, but let us make such an assumption. – 2011-04-07
-
0@Undercover You will get helpful answers by migrating this question to the [stats site](http://stats.stackexchange.com/questions). – 2011-04-07
1 Answers
2
There are several tests you can perform to test the hypothesis that the sample is equal to a predefined distribution. The simplest way is to bin the data, and then compare the frequency counts to the expected counts of any distribution: (normal, uniform, poisson etc.), using chi-square. Alternatively, you can also use the more powerful Kolmogorov-Smirnov test.