1
$\begingroup$

This is a practical question that was asked on a correspondence chess site. The site struggles with chess engine users, who are considered cheaters. The method people use to determine whether someone is a cheater is this. Take a large enough number $x$ of games the person has played; analyze them with an engine (starting from the point where the game leaves online chess databases, whose use is allowed) and find the average match-up rates, for example:

  • 1st choice - 50% match-up rate,
  • 2nd choice - 65% match-up rate,
  • 3rd choice - 80% match-up rate,

where the first number is the ratio $\frac{\text{the number of analyzed moves that are considered best by the engine}}{\text{the number of analyzed moves}},$

the second number is the ratio $\frac{\text{the number of analyzed moves that are considered best or second best by the engine}}{\text{the number of analyzed moves}},$

and the third number is found in a similar way.

The question is how large $x$ must be in order for these numbers to approximate the overall match-up rates well.

I think the right way to ask this question more precisely is to fix a small error $\varepsilon$ and a small probability $p$, and ask how large $x$ must be for the probability of the actual error exceeding $\varepsilon$ to be less than $p$.

I did do a course in statistics, but I remember almost nothing of it. Could you help me with this?

  • 0
    @Fixee Well, the idea is to analyze _many_ games, not just one. The match-up rates are hoped to become more and more meaningful as the number of analyzed games grows.2012-11-17

1 Answers 1

6

Binomial or multinomial models will not work well in practise here, and you need to do some serious empirical work to get good tests for detecting cheaters. The problem is that chess players do not have fixed probabilities for choosing the computer's recommendations. In particular, if the computer considers one move to be much stronger than all other alternatives, then it is much more likely that a human player will arrive at the same conclusion.

Ken Regan at the University of Buffalo has done some serious work on this question, which you can read about on his home page.