0
$\begingroup$

I am currently taking a free on-line AI class offered by Stanford (ai-class.com). It is the first time I am exposed to Bayes Network/Probability. I am having a little problem with the following Quiz problem:

   C   / \  /   \ T1   T2 

Given C is known and T1 and T2 are conditional independence

P(C) = 0.01
P(+ | C) = 0.9
P(- | not C) = 0.8
P(C | T1=+) = 0.043

P(T2=+ | T1=+) = ?

The solution to the problem is given below:

P(T2=+ | T1=+) = P(T2=+ | T1=+, C) * P(C | T1=+) + P(T2=+ | T1=+, not C) * P(not C | T1=+)

My question to the solution above is:

1) Why do we use total probability approach to solve the problem?

2) During the expansion, why do we add "C" and "| T1=+" to the whole equation?

The quiz question can be found here: http://www.youtube.com/watch?feature=player_embedded&v=EmLvORqH-Dg

The answer to the quiz is here: http://www.youtube.com/watch?feature=player_embedded&v=6d2lH9JP6kw

I would really appreciate it if someone can help me to understand the solution.

Thanks,
Lee

1 Answers 1

0

Imagine you have a big population to start with (100000). Some have cancer (1000) and rest don't (99000). For both tests, those with cancer will test + with probability .9 (900 true positives) and those without with probability .2 (19 800 false positives). The total population of positives is 19800+900=20700

As the video explains, you know T1 and T2 are conditionally independent. This means that given C (that is, on a subpopulation with cancer), the probability of T2=+ is not changed by whether you get + or - on T1. That is, of the 900 true positives for T1 0.9*900=810 will be positive for T2 (out of 20700). This is the P(T2=+ | T1=+, C) * P(C | T1=+) term.

This is the key point. Conditional independence tells you how to get the number of T2=+ people out of people with C and T1=+ (and, similarly, out of people with notC and T1=+, see next paragraph). Without restricting to the subpopulation with C, you can not immediately say how many of the T1=+ people will also have T2=+, as T1 and T2 are not independent, but only conditionally independent (conditional on C).

Similarly, given not C, the probability of T2=+ is not changed by whether you get + or - on T1. So the number of false positives for both T2 and T1 is .2*19 800=3 960 (out of 20700). This is the second term.

The total is (3 960 + 810) / 20 700 = 0.23043

  • 0
    In addition to the great post by Max, here is the response from my AI class professor - http://www.aiqus.com/questions/5833/additional-explanation-on-conditional-independence-2-quizze-by-professor-thrun2011-10-24