1
$\begingroup$

Need some help with this exercise (Exercise 4.1 Probability Theory, E.T.Jaynes):

Suppose that we have vectors of events $\{H_1,...,H_n\}$ and $\{D_1,...,D_m\}$ which satisfy:

(1) $P(H_i H_j)=0$ for any $i\neq j$ and $\sum_iP(H_i)=1$

(2) $P(D_1D_2...D_m|H_i)=\prod_jP(D_j|H_i)$, for all $1\leq i\leq n$

(3) $P(D_1D_2...D_m|\overline{H_i})=\prod_jP(D_j|\overline{H_i})$, for all $1\leq i\leq n$

where $\overline{X}$ means the negation of $X$.

Claim: If $n>2$, then at most one of the following fractions

$\frac{P(D_1|H_i)}{P(D_1|\overline{H_i})},\frac{P(D_2|H_i)}{P(D_2|\overline{H_i})},...,\frac{P(D_m|H_i)}{P(D_m|\overline{H_i})}$ can differ from unity, for all $1\leq i\leq n$.


Are conditions (1)~(3) sufficient to establishing the claim? If so, how? Is there an intuitive explanation why it has to be true?


Edit: It maybe helpful to explain the motive a bit. Think of $\{H_1,...,H_n\}$ as a set of exhuastive and mutually independent (which means (1) applies) candidate hypotheses we want to test by some experiment that generates data $\{D_1,...,D_m\}$.

Define $O(H_i|D_1D_2...D_m)\equiv \frac{P(H_i|D_1D_2...D_m)}{P(\bar{H_i}|D_1D_2...D_m)}$ as the Odds that $H_i$ is true v.s false, giving data $D_1$ to $D_m$.

By this definition, $O(H_i|D_1D_2...D_m)=O(H_i)\frac{P(D_1D_2...D_m|H_i)}{P(D_1D_2...D_m|\bar{H_i})}$.

Now in such tests it is common that you can design your experiment so that the data $D_i$ are mutually independent, giving $H_i$. So (2) is true. (To be technically strict though, (2) is a weaker condition because it is implied by but doesn't imply mutual independence)

If the claim is true and you have more than two hypotheses to test, the experiment will serve its purpose only if (3) is false. To see why, suppose instead (3) holds, so the data are also independent giving the negation of $H_i$. Then we have

$O(H_i|D_1D_2...D_m)=O(H_i) \prod_j\frac{P(D_j|H_i)}{P(D_j|\bar{H_i})}$ (*)

But by the claim, only at most one of the fractions in the production can differ from $1$, which means at most one datum can be useful for improving upon the prior odds of a hypothesis ($O(H_i)$).

The lesson is that giving (1) and (2), even if $D_i$'s are physically or causally independent, (3) remains a strong ad hoc assumption that either reduces the information of additional data to triviality (if true) or calculates incorrect result by (*) (if false).

  • 0
    If I am understanding the notation correctly, then in order for those fractions to be unity, we must have $P(D_{j}|H_{i}) = P(D_{j}|\bar{H_{i}}) = 1/2$, because $P(D_{j}|H_{i}) + P(D_{j}|\bar{H_{i}}) = 1$. So given that it's this specific, I'm guessing you can derive a system of equations for determining the different probabilities by assuming each fraction equals some value $f_{ij}$, possibly not unity. Then that system probably will only have non-trivial solutions under the stated result. At least, this is the line of thinking I'd look down first.2012-03-18
  • 0
    @EMS: $P(D_{j}|H_{i}) + P(D_{j}|\bar{H_{i}}) = 1$ Why would this be true?2012-03-18
  • 0
    If I am understanding the notation (which maybe I am not), then $\bar{H_{i}}$ is the negation of $H_{i}$, basically like the complement if you want to think of $H_{i}$ as a set/event. Then the law of total probability gives my statement. That may not be what the symbol $\bar{H_{i}}$ is actually meant for though... Does it explain that in any more detail?2012-03-18
  • 0
    Yes $\bar{H_{i}}$ is the negation of $H_i$. But why would the law of total probability imply $P(D_{j}|H_{i}) + P(D_{j}|\bar{H_{i}}) = 1$ ?2012-03-18
  • 0
    Because the whole sample space $\Omega_{i} = H_{i}\cup\bar{H_{i}}$ by definition in that case, and because $H_{i}$ is disjoint from its negation... again, unless you're using 'negation' in some non-obvious way, in which case my interpretation is off.2012-03-18
  • 0
    @EMS: Why? LTP applies only for $P(H_{j}|D_{j}) + P(\bar{H_{i}}|D_{j}) = 1$, not the other way around, don't you agree?2012-03-18
  • 0
    Ah, yes, you're right. I was mis-reading it.. but it just results in a typo. Where I had written $1/2$ before, it should be $P(D_{j})/2$ because the LTP means the terms add up to $P(D_{j})$, not to 1 as I mistakenly claimed. But I think this still could be fruitful in terms of yielding a system of equations in the $D_{j}$, then applying the given conditions as constraints.2012-03-18
  • 0
    let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/2815/discussion-between-eric-and-ems)2012-03-18
  • 0
    Ok, but I'm heading to bed for tonight. I'll pick it up tomorrow.2012-03-18

0 Answers 0