1
$\begingroup$

I'm having a hard time trying to undestand this.

I'm doing an exercise where I have to formalise 'your test has a false-positive rating of 5%'. If $B$ means that the test is positive, and $A$ means that a given person has the disease, then is $P(A^{C}|B)=5\%$, or $P(B|A^{C})=5\%$?

Intuitively, I would have picked the second option. A false-positive is when I'm testing a person who doesn't have the disease, but still get a positive result. However, this website http://vassarstats.net/bayes.html, says otherwise. According to it, a false positive would mean the first.

I don't get this. Could someone please elaborate.

Edit: I thought that after reading Bram28's answer I understood it, but actually I'm no less confused than before. For reference, I am referring to exercise 3.6 on page 19 in this book: http://www.karlin.mff.cuni.cz/~lachout/Vyuka/O-Sem/JacodProtter2004.pdf. According to Bram28's answer, false positive means $P(A^{C}|B)$. But this fixes $P(A|B)=1-P(A^{C}|B)$, which is what is asked to be determined in the question, without needing the 'accuracy' which is also given in the exercise. Does somebody understand what is happening there?

2 Answers 2

2

Quasar gave you the correct way to calculate what you wanted to calculate, but I understand you are looking to clear up some of your confusion regarding terminology, so I hope the following helps (but I'll give you that it is all very confusing though!!):

A 'false positive' occurs when the disease is not present even though the test is positive. So, this is the event $B \land A^C$.

So, if you want to know the probability of getting a false positive, that is $P(B \land A^C)$, and that we can work out in two different ways:

$P(B \land A^C) = P(B)*P(A^C|B)$ or

$P(B \land A^C) = P(A^C)*P(B|A^C)$

Here:

$P(A^C|B)$ is the probability of the test being wrong (i.e. being a false positive) when the test comes out positive. So: out of all the cases where we test someone and they test positive, how often is that test result the wrong result?

$P(B|A^C)$ is the probability of the test getting the wrong result (and again be a false positive) given that the person tested does not have the disease.

Now, the confusing thing is that one can reasonably use the phrase 'false positive rate' to mean any of these three different probabilities: $P(B \land A^C)$ (how often do false positives occur?), $P(A^C|B)$ (how often is a positive test false?), and $P(B|A^C)$ (how often do we get a false positive for a healthy being?)

Something similar goes for the term 'accuracy'. That is, we may say that some test is '95% accurate', but what does that mean? Here, we could reasonably mean (or at least: the 'person on the street' could reasonably interpret this as):

$P(B|A)$: How often does the test accurately diagnose someone with the disease?

$P(A|B)$: if I test positive, what is the chance that I actually have that disease? (this could be called a kind of 'predictive accuracy')

Now, professionals will typically mean the first one (but it never hurts to ask them and be clear about what they really mean when they use this phrase!!!), because $P(B|A)$ is much more 'stable' than $P(A|B)$, and that is because if we compare a situation where a large percentage of the population has the disease with a situation where a small percentage of the population has the disease (in other words, as $P(A)$ changes over time ... which that can of course happen just fine), then $P(A|B)$ can change quite a bit as well, while this is far less true for $P(B|A)$: the chance for a test to get the correct result when we test someone who has the disease will be pretty much the same over time (unless we suppose that our biophysical states change significantly over time, which is far less likely).

So, if we say that a test is "95% accurate", we are probably referring to $P(B|A) = 0.95$.

However, here is one more wrinkle:

If 'the test is 95% accurate" means $P(B|A) = 0.95$, then do we know $P(B^C|A^C)$, i.e. the 'accuracy' of correctly diagnosing (i.e. the test coming out negative) a person that does not have the disease? No, we don't. Maybe the test is on the 'conservative' side, and will more likely come out positive for a person without the disease, than that it will come out negative for a person with the disease (indeed, in this context, a false negative could have far more dire consequences than a false positive!). So, in that case, $P(B^C|A^C)$ will be smaller than 95%. In fact, if you think about it, one could try and define an 'accuracy' rating that takes into account both of these probabilities, i.e. one could also reasonably interpret an 'accuracy rating of 95%' as $P(test correct)=0.95$ where:

$P(test correct)= P(B|A)*P(A) + P(B^C|A^C)*P(A^C)$

In the exercise in the book you refer to, though, they do specify a 'false positive rating' in addition to the accuracy rating, and by the false positive rating they mean $P(B|A^C)$. Which, by an analogous argument to the one given above, makes sense: $P(B|A^C)$ is likely to be more stable over time than $P(A^C|B)$.

The take home message is this though: the terminology is confusing, sine it can be interpreted in many reasonable, but different ways. Indeed, I assume the professionals themselves have a hard time getting all these distinctions straight. Therefore, it never hurts to ask what exactly they mean: statistical confusion has probably hurt more people than any one particular disease itself!

  • 0
    This was a typo on my side. But why is test-accuracy defined as $P(B|A)$ then?2017-01-26
  • 0
    @see That's an excellent question! Now, I don't see accuracy defined on that website, but I do know that the 'accuracy' of a test can mean different things: Sometimes they mean $P(A|B)$ (called 'predicate accuracy', i.e. Given that I test positive, what is the chance I actually have the disease?), but it is (more commonly) used to mean $P(B|A)$, i.e. out of all the cases where the disease is present, how often dows the test detect that, i.e. how well does the test classify the diseases? So yes, it's all rather confusing!!!2017-01-26
  • 0
    @see there is an old puzzle that goes like this: 'suppose I test positive for a disease, and that the test is 95% accurate. What is the chance I have the disease?'. Our immediate reaction is '95% of course!'. And it would indeed be 95% if by 'accuracy' we mean $P(A|B)$. But again, we typically mean by 'test-accuracy' $P(B|A)$. ... And now it turns out that the chance of me having the disease depends on (not surprisingly) what percentage of the population has the disease, i.e. $P(A)$: if that is only a very small percentage, then there is a good chance that I am one of those false positives!2017-01-26
  • 0
    @see How does your exercise define A and B? Are they maybe swapped from the website? Anyway, the answer to the puzzle is not the 95% that many people think it is. It really depends on the 'base rate' of the disease. example: suppose there are 1000 people and 2%, i.e. 20 people have the disease, and so 980 do not. Now, when we then say that the test is 95% accurate, we mean that 95%, i.e. 19 of the 20 diseased people test positive, and that 95% of the 980, i.e. 931 non-diseased people correctly test negatively. but this means that 49 non-diseased people test positive. ...(coninued)2017-01-26
  • 0
    Yeah this is exactly the exercise I am doing, and it got me indeed puzzled which is why I looked up what false-positive actually means, leading to this post. It's just counter-intuitive to me why one would chose as test-accuracy $P(B|A)$, everything else became clear from your answer.2017-01-26
  • 0
    ... So, how many people test positive? 19+49=68. And how many of those actually have the disease? Only 19. So, the chance of you having the disease when testing positive is 19/68 ... Which isn't even close to 95%. In fact, if you test positive, you are still more likely to *not* have the disease than that you do have the disease ... Exactly because the base rate of the disease is so low ... And thus even thought the test is fairly accurate, the 49 false positives outnumber the 19 true positives! Of course, when you change the base rate, these percentages will change again.2017-01-26
  • 0
    So calling $P(B|A)$ test-accuracy instead of $P(A|B)$ is just convention right? I mean, it could mean either to the man on the street.2017-01-26
  • 0
    @see exactly! the 'person on the street' would indeed be (rightfully) quite confused about this, since both meanings seem perfectly appropriate meanings! But I think there *is* a good reason for why we picked $P(B|A)$ as 'the' meaning, and that is because that number stays the same no matter how many people have the disease ( that is, how 'good' the test is in terms of correctly classifying someone reamins the same) whereas, as I explained, $P(A|B)$ depends On how many people have the disease. So, if for example a lot of people came down with the disease, the 'accuracy' in that sense would ...2017-01-26
  • 0
    ... change ... And that seems a bit weird: the accuracy of the test changes depending on how many people have the disease?2017-01-26
  • 0
    That explanation makes indeed sense, but I think I'm doing something wrong now. Given a fixed false positive rate $P(A^{C}|B)$ (this is given in the formulation of the exercise, though only as 'false positive rate', not explicitly as $P(A^{C}|B)$), don't I have $P(A|B)+P(A^{C}|B)=\frac{P(A\cap B)}{P(B)}+\frac{P(A^{C}\cap B)}{P(B)}=\frac{P((A\cup A^{C})\cap B)}{P(B)}=\frac{P(B)}{P(B)}=1$, i.e. $P(A|B)=1-P(A^{C}|B)$ doesn't even depend on $P(A)$ anymore?2017-01-26
  • 0
    It is exercise 3.6 in here: http://www.karlin.mff.cuni.cz/~lachout/Vyuka/O-Sem/JacodProtter2004.pdf2017-01-26
  • 0
    @see yes, that's correct: $P(A|B) = 1-P(A^C|B)$. But remember, you probably want to figure out $P(B|A)$2017-01-26
  • 0
    No, it says so explicitly in the exercise. Could you please check it so I can make sure I'm not turning insane. It's on page 19.2017-01-26
  • 0
    @see Ok, looked at the exercise. So they give you $P(B|A)$, but they want to know what is $P(A|B)$. hey, sorry, but I have to go!! Good luck!!2017-01-26
  • 0
    But they also give a false positive rating, which according to your answer is $P(A^{C}|B)$, which fixes $P(A|B)$, and I'm back at where I started....2017-01-26
  • 0
    could you check Quasar's answer above. He uses another definition of false positive than you but it seems to work.2017-01-26
  • 0
    @see Ha ha, yes, it's like the 'accuracy' that is ambiguous. OK, so Quasar talks about $P(T|D^C)$, which is the probability that a test will turn out positive given that someone does not have the disease, whereas $P(D^C|T)$ is the probability of a test, that we already came out positive, turns out to be a false positive. So which one is it? Well, first let's establish that when we are dealing with a false positive, we are dealing with the event $T \land D^C$, so the *event* of a false positive is something different than any of these conditional probabilities .. (continued)2017-01-26
  • 0
    So in that sense, asking 'which one is correct'? is already a bit off. Now, we could ask 'what is the false positive *rate*'? ... which is actually more in line with the idea of 'accuracy' (but rather a kind of 'inaccuracy' of course). And now of course we get the same ambiguity as we got for 'accuracy'. Still, if we use $P(T|D)$ as our meaning of 'accuracy', then it would certainly be consistent with that to use $P(T|D^C)$ (which is of course $1-P(T|D)$ ) as the rate of 'inaccuracy' or 'false positive rate'.2017-01-26
  • 0
    @see OK, so based on our discussion I now realize how my Answer did not take into account all these subtleties. I'll update!2017-01-26
  • 0
    How do you get from 'will more likely come out positive for a person without the disease, than that it will come out negative for a person with the disease ' to $P(B^{C}|A^{C})>95\%$?2017-01-26
  • 0
    @see yeah ... That's not quite right right, is it ... Ok, so for such a 'conservative' test, we have $P(B|A^C)$ > $P(B^C|A)$. And so since $P(B^C|A^C) =1-P(B|A^C)$, and $P(B|A) =1-P(B^C|A)$, we have that $P(B^C|A^C) < P(B|A)$. So yeah, it should be $P(B^C|A^C) < .0.95$. Good catch!2017-01-26
  • 0
    Nevermind my last comment. Thank you very much for the detailed explanation, this clarified a lot. I just need to get some more practice with formalising those statements.2017-01-26
2

@see,

We treat the test as evidence and the occurrence of the disease in the population as the hypothesis. And in problems like these, we are given $P(evidence|hypothesis)$. Therefore, $P(B|A^C)=0.99$. For ease, I define:

$T:= \text{Test returns positive}$

$D:= \text{Individual has the disease}$

We are given:

True positive : $P(T|D)=0.99$, and $P(T^C|D)=0.01$

False positive(Test falsely returns a positive, when in fact the person does not have the disease; 500 test positive in 10000 people not having the disease): $P(T|D^C)=0.05$ and $P(T^C|D^C) = 0.95$

And the occurrence of the disease is rare. $P(D)=1/10000=0.0001$

We are being asked, that if the test returns positive(evidence), what is the likelihood that you have AIDS(hypothesis)? $P(D|T)=?$

Using Bayes rule and LOTP,

$$\begin{align} P(D|T)&=\frac{P(T|D)P(D)}{P(T)}\\ &=\frac{P(T|D)P(D)}{P(T|D)P(D)+P(T|D^C)P(D^C)}\\ &=\frac{(0.99)(0.0001)}{(0.99)(0.0001)+(0.05)(0.9999)}\\ &=0.001976285 \end{align}$$

Intuitively, in a population of $10,000$, $1$ is likely to have the disease, $9999$ are likely not to have the disease. The test has an accuracy rate of $99\%$. Given that they have the disease, $0.99\cdot{1}=0.99$ persons are likely to test positive, $0.01\cdot{1}=0.01$ persons would falsely test negative. Given that they do not have the disease, $9499.05$ persons would test negative, $499.95$ persons would falsely test positive.

We are asked, given that you test positive, what is the likelihood you have AIDS? So, that would be $0.99/(0.99+499.95)=0.001976285$.

Hope that answers your question.

  • 2
    Isn't your definition of false-positive the exact opposite of Gram28's?2017-01-26
  • 0
    Modified the answer for the accuracy rate=$0.99$. From what I understand, $T|D=\text{True positive}$, $T^C|D=\text{False Negative}$, $T|D^C=\text{False Positive}$, $T^C|D^C=\text{True Negative}$2017-01-26
  • 0
    Also, intuitively, think how would you deduce the true-positivity and false-positivity of a diagnostic test? Conduct trials on a large population. How many test positive, restricting our attention to diseased people? That's, $T|D$ - True positive. How many test positive, restricting our attention to the healthy people? $T|D^C$ - False positive. Does that help?2017-01-26
  • 1
    Your definition matches my intuition, as I wrote in my opening post. Your solution also matches what I calculated on my own when I first did the exercise, so I guess I'll go with this interpretation.2017-01-26