0
$\begingroup$

I have a dataset made of couples $(n_i,v_i)$ where $n_i$ denotes the number of times a game has been played, and $v_i$ the number of victories at the $i-$th day.

What is the best way to evaluate the probability $P$ of winning the game? (We can assume that winning the game does not depend on time).

My first thought is to evaluate $P$ as the total number of victories over the number of games, i.e. $P = \frac{\sum_{i=1}^t v_i}{\sum_{i=1}^t n_i} $

I then had the doubt though that I could evaluate $p_i = v_i / n_i$ and define $P$ as the mean of the individual probabilities: $ P = \frac{1}{t} \sum_{i=1}^t p_i $.

Somehow I feel this second approach is wrong, but can't entirely understand why.

How would you evaluate $P$ and why? Can you give me some links explaining how to evaluate reliable statistics?

1 Answers 1

1

The answer may depend on whart you know beforehand about that probability $p$. What we can says, is: If the probability is $p$ then the probability of observing $v=\sum v_i$ victories within $n=\sum n_i$ games is ${n\choose v}p^v(1-p^{n-v}) $ and this expression is maximal when $p=\frac v n$. Thus if we have no a priori knowledga about $p$ (i.e. consider each value $\in[0,1]$ equally likely), then $p=\frac vn$ is the best guess.

You can also do this on a day.by.day basis, but then your adjustment after day $i$ must take into account that you do have some a priori knowledge. You need to apply Bayes theory and this will in the end lead to exactly the same result.

  • 0
    Thanks Hagen, but I can't really follow your reasoning (it is a long time since I applied probability last time). Can you please send me some reference? Why would maximizing p be the best guess? Anyway yes, I assume that p is uniform in [0,1]. I think this is quite standard job for statisticians, and I am looking at the standard way to do this. Also I don't understand how a priori knowledge would influence the result, since $p_i$ doesn't depend on previous times...2012-10-06