8
$\begingroup$

Does that mean if I draw samples from the population that 90% of the time I'll get a number between 1 and 9?

Added: assume normal distribution for the population.

  • 0
    Also posted at http://mathoverflow.net/questions/106629/what-does-it-mean-when-a-statistician-says-im-90-confident-that-the-mean-of-the2012-09-08

4 Answers 4

8

No. In a typical setting such a statement is based on the mean of a random sample. Each possible sample of that size from the given population has a certain sample mean. Some of these are very close to the population mean, and some are quite far away. (Imagine, for instance, trying to estimate the average height of an adult male by taking a random sample of adult males and just happening to get a sample consisting entirely of pro basketball players!) However, most of the possible samples will have sample means quite close to the population mean.

What the statement means, then, is that if the population mean is not between $1$ and $9$, the statistician must have drawn one of the very unrepresentative samples $-$ one that’s so unrepresentative that only $10$% of the possible samples are equally unrepresentative (or worse).

Added: Let’s say that there are $N$ possible samples of a given size from that population. The means of those samples will cover a range, from the smallest possible sample mean to the largest. The actual population mean will be somewhere in the middle. Now draw two lines, the first cutting off the $5$% of the samples with the smallest means, the second cutting off the $5$% with the largest means. Here’s a rough sketch of the situation, with $S$ for the smallest and $L$ for the largest possible sample means:

                 first cut                     second cut    x-----------------|-----------------------------|------------------x      S<-------5%------>C<------------90%------------>D<-------5%------->L 

The percentages are the percentages of all possible samples having means in the indicated ranges. If you draw a sample at random, on average you’ll get a sample with a mean between $C$ and $D$ $90$% of the time, because $90$% of all possible samples have means between $C$ and $D$, and the samples are all equally likely to be picked when you pick at random.

Similarly, on average you’ll get a sample with a mean between $S$ and $C$ about $5$% of the time, and one with a mean between $D$ and $L$ about $5$% of the time.

The statistician is saying that if the population mean is not between $1$ and $9$, then his sample was either below the first cut, $C$, or above the second cut, $D$. In other words, either he got one of the $10$% of samples that are least like the population, or the population mean is between $1$ and $9$. ‘I’m $90$% confident that the population mean is between $1$ and $9$’ is verbal shorthand for all of that explanation.

  • 0
    Is the sample size here fixed? Or is it allowed to vary the in set of possible samples (which would mean it would be every possible combination of the population ranging from size 0 to entire population size). That diagram seems to imply that 90% of all possible samples would result in a sample mean that is between$1$and 9, and 10% of all sample means would not.2018-04-26
3

When a statistician makes such a statement, it usually means they are confused.

Most statisticians are frequentists, and the frequentist paradigm disallows such statements, though frequentists say things like this all the time.

To a Bayesian statistician, the statement means that the uncertainty in the population mean has been modeled as a probability distribution. Starting from a prior distribution that expresses what it known/believed before acquiring data, the distribution is updated in light of data according to Bayes theorem. The updated distribution contains 90% of its mass between 1 and 9.

  • 0
    @Henry, the question title does not refer to probability, only confidence, so the frequentist who utters that phrase is not placed in the position of imputing a meaningless (from his point of view) random nature to a model parameter.2012-09-14
2

When you draw a sample, you don't get a number; you get a list of numbers. Or more precisely, a sample is a list of numbers.

No, it is not true that 90% of those numbers in the list are between $1$ and $9$, nor is it true that $90\%$ of the time, when you take a sample, anything in particular will be between $1$ and $9$.

A confidence interval depends on the list of numbers that you get in a sample. Say you take a sample of $20$ numbers, and the resulting confidence interval for the population mean is the interval from $1$ to $9$. Typically a much smaller proportion than $90\%$ of the numbers in the sample are between $1$ and $9$, and not infrequently, none of them are.

Now say you take another random sample of $20$ numbers from the same population, and the resulting confidence interval is from $2$ to $8.5$. And then you take another random sample of $20$ from the same population, and the confidence interval you get is from $1.5$ to $11$. And so on. Then $90\%$ of the time, the interval you get will include the population mean. That is what it means.

  • 0
    having say $N$ estimates of that parameter from samples that we drawn (say each having length $K$) now we can estimate the distribution of $\Theta$, our estimator of $\theta$. Now the confidence interval is P(\gamma_1<\Theta<\gamma_1)=0.9. Is this correct or not?2012-09-08
0

Assuming a nromal dsitribution for the population would only add a specific formula for obatining a confidence interval. If you truly want the Bayesian posteriori probability for the interval [1, 9] then the normal distirbution is needed for the likelihood piece but you also need to specify a prior distribution for the mean.