0
$\begingroup$

I have been caught up in a problem involving multiple persons (say 100) with different unknown accuracy to predict the happening of an event. In other words, everyone is accurate but % accuracy of each person individually is unknown. So the problem starts with predictions of an event (true/false) by everyone. Once everyone outputs their predictions, I will output the correct prediction (either true or false). How can I determine who is the most accurate one. I want to train my algorithm with every input so that after N times, my algorithm should be smart enough to predict accurately (not necessarily by 100% accuracy) the exact next correct output.

Does this have something to do with expected value?

1 Answers 1

1

If you know for sure that 'everyone is accurate' (which I guess mean they all guess right significantly over $50\%$), their guesses are more-or-less independent of one another, and you have 100 people, then simply taking a majority rule will probably be very accurate , so as a practical matter optimizing further might be overkill.

Generally, the approach of finding the best guesser and just picking their guess is bad. Usually, it will be more accurate to incorporate the votes of many decent guessers rather than one good one, since the odds of the decent guessers' majority vote is wrong goes way down as their number goes up.

A simple approach might be to find all of the guessers with historical success rate over some threshold (say $60\%)$ and use their majority vote as your prediction. If there are some guessers that are systematically wrong (say $<20\%$), you could include them as well, only have them vote for the opposite of their predction. You would want to pick the threshold low enough that there are always a decent number of guessers that make the cut. However, if you set the threshold too low, Also, it might be good to, instead of simple majority vote, compute a weighted average weighted by some function of their empirical success percentage. This would effectively give the better guessers more votes.

Note that the independence assumption is crucial to you getting any wisdom out of a croud. Even if the guessers all right $95\%$ of the time, but the $5\%$ of the time each person is wrong they are all wrong together, then you will still have only a $95\%$ success rate instead of practically $100\%$ in the case where they are wrong independently of one another.

There are also a variety of fancier ways to approach this problem.

One standard method for binary classification is to fit a logistic regression to your past data. The regression function you fit can then be used to predict.

Another method is to use a Bayesian approach. If you believe all the guessers are independent of one another and each has some success probability $p_i$, then you can use probability theory to get a distribution for the $p_i$'s and then average over that distribution to get a probability the answer's true given everyone's guesses.

There are other Machine Learning approaches/algorithms out there too like random forests, support vector machines and neural networks that are popular for this kind of binary classification problem.

I'd recommend starting with something simple and ad hoc and then seeing if it gets adequate performance. Then if it doesn't work well enough, you can move onto one of the standard methods. There are lots of libraries in R and Python that you can use to fit these models.

  • 0
    Thanks a lot. I was about to delve into logistic regression and Bayesian approach.2017-01-21