7
$\begingroup$

Does that mean if I draw samples from the population that 90% of the time I'll get a number between 1 and 9?

Added: assume normal distribution for the population.

  • 0
    When most people say "I'm 90% confident that..." I don't believe the number has any content. It means "I'm pretty sure that...". I would have gotten rich betting against these at 9 to 1 odds over my lifetime.2012-09-07
  • 1
    It means that the statistician has used a calculational procedure that has the following property. If the procedure is carried out many many times, about $90\%$ of the time the statistician will turn out to be right about the interval she announces, and $10\%$ of the time she will turn out to be wrong.2012-09-07
  • 1
    @AndréNicolas You have given the frequentist definition of confidence interval which is probably what the OP is driving at. But if he is serious about fixing the end point at 1 and 9 then i think the only statistical interpretation comes from the Bayesian framework work where the mean is a paramneter that is given a prior distribution and the staement is actually about a credible region for the mean based on the posterior distirbution. Because there you can calculate the proability content for the interval [1,9].2012-09-08
  • 0
    Also posted at http://mathoverflow.net/questions/106629/what-does-it-mean-when-a-statistician-says-im-90-confident-that-the-mean-of-the2012-09-08

4 Answers 4

7

No. In a typical setting such a statement is based on the mean of a random sample. Each possible sample of that size from the given population has a certain sample mean. Some of these are very close to the population mean, and some are quite far away. (Imagine, for instance, trying to estimate the average height of an adult male by taking a random sample of adult males and just happening to get a sample consisting entirely of pro basketball players!) However, most of the possible samples will have sample means quite close to the population mean.

What the statement means, then, is that if the population mean is not between $1$ and $9$, the statistician must have drawn one of the very unrepresentative samples $-$ one that’s so unrepresentative that only $10$% of the possible samples are equally unrepresentative (or worse).

Added: Let’s say that there are $N$ possible samples of a given size from that population. The means of those samples will cover a range, from the smallest possible sample mean to the largest. The actual population mean will be somewhere in the middle. Now draw two lines, the first cutting off the $5$% of the samples with the smallest means, the second cutting off the $5$% with the largest means. Here’s a rough sketch of the situation, with $S$ for the smallest and $L$ for the largest possible sample means:

                 first cut                     second cut
   x-----------------|-----------------------------|------------------x  
   S<-------5%------>C<------------90%------------>D<-------5%------->L

The percentages are the percentages of all possible samples having means in the indicated ranges. If you draw a sample at random, on average you’ll get a sample with a mean between $C$ and $D$ $90$% of the time, because $90$% of all possible samples have means between $C$ and $D$, and the samples are all equally likely to be picked when you pick at random.

Similarly, on average you’ll get a sample with a mean between $S$ and $C$ about $5$% of the time, and one with a mean between $D$ and $L$ about $5$% of the time.

The statistician is saying that if the population mean is not between $1$ and $9$, then his sample was either below the first cut, $C$, or above the second cut, $D$. In other words, either he got one of the $10$% of samples that are least like the population, or the population mean is between $1$ and $9$. ‘I’m $90$% confident that the population mean is between $1$ and $9$’ is verbal shorthand for all of that explanation.

  • 0
    wow.... your assumption of the setting of such statement is correct. thank you for answering, but your final statement at explaining this make it a little more confusing X_X2012-09-07
  • 0
    @user133466: I’ll expand the explanation a bit.2012-09-07
  • 0
    can you turn around and say that 90% of the samples of size n have a mean between 1 and 9?2012-09-07
  • 1
    @user133466: Not legitimately. The problem is that we don’t actually know what the population mean is, so we can’t make statements about what percentage of samples have means in a given range. What we **can** say is that if the population mean is **not** in a certain range, then our sample is a very unlikely one, and therefore we’re pretty confident that the population mean **is** in that range.2012-09-07
  • 0
    how is this interval useful to us? I can only make the claim that I'm pretty sure that the population mean is between 1 and 92012-09-08
  • 0
    @user133466: No, you can make a stronger claim: you can say that if the population mean is not between $1$ and $9$, then your sample is at best in among the $10$% of samples that are least representative of the population. This is a much more specific statement than ‘I’m pretty sure that the population mean is between $1$ and $9$’. And it might be among the least representative $1$%, or even worse. The thing to remember is that it’s the samples that vary; the population mean is a fixed quantity.2012-09-08
  • 0
    how about this: It means if repeated samples were taken from the population and a mean computed for each samples, 90% of the samples would include the unknown mean between 1 and 9.2012-09-08
  • 0
    let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/4766/discussion-between-user133466-and-brian-m-scott)2012-09-08
  • 0
    This is similar to the fact that in hypothesis testing, you either reject the null hypothesis, or you don't. You never accept the null hypothesis.2012-09-16
  • 0
    Is the sample size here fixed? Or is it allowed to vary the in set of possible samples (which would mean it would be every possible combination of the population ranging from size 0 to entire population size). That diagram seems to imply that 90% of all possible samples would result in a sample mean that is between 1 and 9, and 10% of all sample means would not.2018-04-26
3

When a statistician makes such a statement, it usually means they are confused.

Most statisticians are frequentists, and the frequentist paradigm disallows such statements, though frequentists say things like this all the time.

To a Bayesian statistician, the statement means that the uncertainty in the population mean has been modeled as a probability distribution. Starting from a prior distribution that expresses what it known/believed before acquiring data, the distribution is updated in light of data according to Bayes theorem. The updated distribution contains 90% of its mass between 1 and 9.

  • 1
    It’s a perfectly acceptable informal statement that $(1,9)$ is a $90$% confidence interval, a statement that is certainly not disallowed by the frequentist view.2012-09-07
  • 0
    @Brian: But your restatement is not the original title question, which treats the mean as a random variable.2012-09-07
  • 2
    @Henry: I take the original question to be the one in the title, and the one in the body to be the common misinterpretation.2012-09-07
  • 0
    @Henry & John, why does the formalist paradigm forbid the title statement? It is the statement a formalist would make by leaving implicit some of the details (the statistical model and CI procedure that were used), and there are equivalent omissions when a Bayesian utters the same sentence.2012-09-14
  • 0
    @zyx: A Bayesian would regard 9-1 as fair odds on the mean being in the 90% credible interval, i.e. 90% is a probability. A frequentist would regard that as meaningless, as the confidence interval presupposes the hypothesis.2012-09-14
  • 0
    @Henry, the question title does not refer to probability, only confidence, so the frequentist who utters that phrase is not placed in the position of imputing a meaningless (from his point of view) random nature to a model parameter.2012-09-14
1

When you draw a sample, you don't get a number; you get a list of numbers. Or more precisely, a sample is a list of numbers.

No, it is not true that 90% of those numbers in the list are between $1$ and $9$, nor is it true that $90\%$ of the time, when you take a sample, anything in particular will be between $1$ and $9$.

A confidence interval depends on the list of numbers that you get in a sample. Say you take a sample of $20$ numbers, and the resulting confidence interval for the population mean is the interval from $1$ to $9$. Typically a much smaller proportion than $90\%$ of the numbers in the sample are between $1$ and $9$, and not infrequently, none of them are.

Now say you take another random sample of $20$ numbers from the same population, and the resulting confidence interval is from $2$ to $8.5$. And then you take another random sample of $20$ from the same population, and the confidence interval you get is from $1.5$ to $11$. And so on. Then $90\%$ of the time, the interval you get will include the population mean. That is what it means.

  • 0
    thank you for the explanation. My mistake sorry.2012-09-08
  • 0
    Just a question, if you draw a sample that can also be a single sample. It doesnt make sense to make estimation with one sample but it does make for example for detection. Meanwhile I am confused: whenever we draw 20 samples how do we calculate a new confidence interval? as long as I know a confidence interval is fixed and should not depend on the samples you have.2012-09-08
  • 0
    @SeyhmusGüngören : Read the account of confidence intervals in any textbook. You will find that a confidence interval always depends on the sample. Your confusion is merely the condition of someone who's never done that.2012-09-08
  • 0
    I agree. I read already. It depends on the sample size. No doubt on this matter. However according to my understanding whenever the sample size is fixed then so the confidence interval. As a result, given 20 samples, we have one confidence interval for example for 90% we have $1-9$. Whenever we draw another 20 samples this interval does not change. However according to your answer this does change.2012-09-08
  • 0
    One more thing. What I am saying is based on the assumption that population mean is known. I think you assume that it is unknown and whenever you draw a sample you refine the population mean. Therefore confidence intervals are also approaching to the true intervals. That's why whenever a sample is received confidence interval also changes, in the direction of the true values.2012-09-08
  • 0
    @SeyhmusGüngören : I've rarely if ever seen anyone more confused about a topic, when all the confusion would be resolved by simply reading a basic account in a beginning textbook. Confidence intervals depend on data, and not only on sample sizes, and if they did depend only on sample sizes, there'd be no reason to consider confidence intervals in the first place. You're just demonstrating that you're completely clueless. There is no "true interval" that confidence intervals converge to as the sample size increases. And you're misusing the word "sample" even after I corrected you.2012-09-08
  • 0
    @SeyhmusGüngören : A well-behaved confidence interval for a population mean converges to an interval of length $0$, containing only the population mean, as the sample size grows. Please stop making a fool of yourself. Read a basic textbook account of what confidence intervals are.2012-09-08
  • 0
    I suggest you to be kind. I remember another discussion of you with another person who is at least as well knowledged as you. You behaved in the same way to him as well.2012-09-08
  • 0
    @SeyhmusGüngören : Your posted incorrect answer and your other incorrect and confused statement cause me to suspect something. Consider a normal distribution with (population) mean $\mu$ and variance $\sigma^2$. Say this distribution puts probability $0.9$ in the interval $\mu\pm a\sigma$. My suspicion is that you think either that that's what a confidence interval is, or that that is what a confidence interval is supposed to estimate. Both of those ideas are completely wrong.2012-09-08
  • 0
    Yes sure my post was incorrect. It was my mistake as I told you this too. So in my post I tried to say (on average) trying to mean the sample average. It was still incorrect. Additionally $P(1$\theta$. This can be either mean $\mu$ or another unknown. Whenever a sample is drawn, this gives one estimate of that parameter. – 2012-09-08
  • 0
    having say $N$ estimates of that parameter from samples that we drawn (say each having length $K$) now we can estimate the distribution of $\Theta$, our estimator of $\theta$. Now the confidence interval is $P(\gamma_1<\Theta<\gamma_1)=0.9$. Is this correct or not?2012-09-08
0

Assuming a nromal dsitribution for the population would only add a specific formula for obatining a confidence interval. If you truly want the Bayesian posteriori probability for the interval [1, 9] then the normal distirbution is needed for the likelihood piece but you also need to specify a prior distribution for the mean.