2
$\begingroup$

Say I wanted to find the probability of someone passing an exam. Then I could condition on this by how much preparation they had done for the exam. Let $E$ be the event of someone passing an exam and $F$ the event they had done sufficient preparation. So I write $P(E) = P(E|F)P(F) + P(E|F^c)P(F^c)$

However, I also think you can condition on whether someone passes by numerous other reasons: if they were ill, if their dog died, bereavement, financial difficulties, other commitments... So essentially, I could write the above eqn and get an infinite amount of terms.

Is this correct and if so, how would I know in some problem what event to condition on? If not correct, why? Many thanks

  • 2
    I would use $F$ for insufficient preparation:)2012-12-06

2 Answers 2

1

You are absolutely correct. The whole point of the law of total probability is that you will get the same result no matter what event you decide to condition on. If you conditioned on whether the student's dog had died that day instead of whether they prepared sufficiently, your conditional probabilities would be different, but the sum would end up being exactly the same.

If you conditioned on whether they had prepared sufficiently and whether their dog had died, you would have four terms in the sum, and again each of the conditional probabilities would be different, but the sum would still end up being the same.

Suppose you could measure exactly how old someone was (so it's a continuous random variable, with an infinite number of possible values--e.g. 5,376,234.186525... seconds). Then you could condition on that. You can't write it down as a sum, because there are infinitely many (uncountably many, in fact) possible values, so you write it down as an integral instead. But the result will still end up being exactly the same.

And that's the magic of the law of total probability.

  • 0
    Basically, you can condition on whatever other event makes it easiest to calculate the probability you are interested in.2012-12-06
0

If your question is "what explanatory variables explain the response variable", then you should probably use some form of a regression model: GLM, linear regression, etc. Then by looking at p-values of the coefficients of response variables, you will be able to understand, whether they are different from 0 at a given level of significance, and hence the variable should be kept in the model or not.

  • 0
    Thats fine. You can upvote as many answers as you want, except those you wrote yourself.2012-12-06