This is a practical problem that arose in real life, which I believe creates interesting mathematical questions.
There is a festival of small plays lasting 8 weeks. Each week 10 short plays are staged (as one show) and the audience votes for their favorite play among those 10. The next week we have a set of 10 new plays, the audience (different to last's week's audience) votes again and so on. Thus, from each week we have a most popular play. So 8 most popular plays. We want to select the 2 "most popular" out of these 8 plays to advance to the Final. The quotes around "most popular" are there to show that it is inherently unfair to compare popular plays from different weeks, since each week has a different set of plays. But let's say we cannot do anything about this and still we need to find a "fair" way to announce the 2 most popular out of the popular winners from each week. One obvious thing we do is to use percentages of votes instead of absolute vote numbers, since the audience numbers (hence votes) vary from week to week. The two plays with the highest percentages among the 8 winners are named "most popular" and advance to the Final.
There is a complication though that makes the whole problem mathematically interesting. Not all weeks have exactly 10 plays. Some might have 8 or 9, some might have 11. It is unfair to compare percentages of popular votes from sets that have different cardinality. If you do not see the unfairness immediately, consider an extreme case: week A has just two plays and week B has 10 plays. The winner of week A gets 60%, the winner of week B 20%. Is really A's winner that much more popular? Of course not. We need a way to adjust the percentages to a normative week of 10 plays.
The organisers of the festival have recognised this and apply the following formula, where $p$ is the percentage of received votes of the most popular play of one week, $\hat{p}$ the adjusted percentage, and $n$ the actual number of plays in that week.
$\hat{p} = p \cdot \frac{n}{10}$
This formula might seem intuitive at some level, but on closer inspection one can find that it is really arbitrary. Let's take n to be 9, and assume one play got 20%. The adjusted percentage is 18%. This can be interpreted in 2 equivalent ways: either
- The total number of votes remains the same and 1/10 of each play's votes were taken to be given to an imaginary extra play, or
- The votes of each play remain constant and the extra imaginary play gets an extra 1/9 of the total votes cast, so we have 10/9 total votes cast (compared to the actual votes cast). In other words the extra imaginary play got the average number of votes among the 9 plays.
Makes some sense, but let's look closer.
- For interpretation 1: why should we take the same % votes from each play? This way we are taking more votes from the popular plays and less from the unpopular. Why not take the same amount of votes from each play (not same %)? Or even take more votes from unpopular plays than from popular. Afterall isn't probable for a popular play to retain its votes?
- For interpretation 2: Why should the imaginary play added, add to the total votes, the average votes per play? Why not the median? Why not some other estimation that gives the most probable number of votes given the distribution of votes we have seen so far?
Do different ways matter, and is there a more fair way? (maybe in the sense of maximum likelihood). I give a partial answer below in the answers section
I would also appreciate your help in finding a better title for the problem.