I am working on a program to solve Hangman. When you place a correct letter in the word, for example,
_ _ _ E _ S
I calculate which words this could be, and then of those words, I calculate what the next likely letter you should guess. So for in this example, there are 423 possible words, and of those, 272 words contain the letter "R". The next most likely letter is A, with 141 out of 423 words. So I can say that there is a roughly a 64% chance that the word contains the letter R.
However, as the number of possible words decreases, the "likely" letter becomes less reliable or valid. For example, if the known letters now becomes:
_ _ T E R S
There are 21 possible words. The next likely letter is "A", with 9 out of 21 words, and there is a 42% chance of it containing A. However, because the size of the possible words is so small, I don't think that 42% represents a true "probability", and my confidence would be much less for this than for the R example above. I vaguely remember a concept that corresponds to a "confidence" level for a probability, and the less samples you have, the less confidence is generated, but as you increase the number of samples, this increases.
Could anyone point me to what this concept would be?