2
$\begingroup$

If I have 5 people classify an image as offensive or not, how can I calculate my confidence that the answer that 3 or more people agree on is the correct answer?

I have past accuracy rates, but am not sure how to incorporate my confidence in those accuracy rates. For example, Person A may be 90% accurate, but with only 10 sample points. Person B may be 85% accurate with 1000 sample points.

I've gotten as far as knowing the probability that all 5 people will choose the correct answer is, assuming P(X) is the probability that Person X will choose the correct answer:

P(A)*P(B)*P(C)*P(D)*P(E)

I've been looking at confidence intervals and multinomial models, but haven't been able to piece everything together.

1 Answers 1

1

There are a lot of different tools you can use, but they all depend on a slightly different formulation (assumptions) of your problem. Let me offer one based on a Bayesian approach to the problem.

Assume each item is either offensive $X=1$ or not $X=0$. Then, define the random variable $Y_i$ to be the label assigned by the $i^\mathrm{th}$ person. You mentioned you had a model for each person, i.e., $p_{Y_i|X}(Y_i = 1 | X = 1)$ and $p_{Y_i|X}(Y_i = 1 | X = 0)$. So, if person $i$ labeled the item offensive, the probability the item is actually offensive is

$ p_{X|Y_i}(1|1) = \frac{p_{Y_i|X}(1|1)p_X(1)}{p_{Y_i}(1)} = \frac{p_{Y_i|X}(1|1)p_X(1)}{p_{Y_i|X}(1|1)p_X(1)+p_{Y_i|X}(1|0)p_X(0)}$

from Bayes law. If we assume the responses are conditionally independent (when conditioned on $X$) then we get

$ p_{X|Y_i, Y_j}(1|1,1) = \left(\frac{p_{Y_i|X}(1|1)}{p_{Y_i}(1)} \times \frac{p_{Y_j|X}(1|1)}{p_{Y_j}(1)}\right) p_X(1)$

where the denominators can be computed as above. This just becomes a product over all the observations of the hidden variable you have (for $Y_k$, etc.). By the way, this is exactly a Naive Bayes classifier.

Now the question is what is the probability the next person will label it offensive? Let $Z_k$ denote the unknown response. Then,

$p_{Z_k|Y_i,Y_j} = p_{Z_k|Y_i,Y_j,X}(1|y_i,y_j,1)p_{X|Y_i,Y_j}(1|y_i,y_j) + p_{Z_k|Y_i,Y_j,X}(1|y_i,y_j,0)p_{X|Y_i,Y_j}(0|y_i,y_j)$

from total law of probability. Basically, its the probability (given previous responses) that the item is offensive and the new response is correct plus the probability that the item is not offensive and the new response is wrong.

Good luck.

  • 0
    Ahh, that makes sense. Thanks for the Bayes classifier observation; much appreciated!2011-02-21