0
$\begingroup$

Let's say I have a string set such as "AAAABBBBAAAABAAAA" and I want to have some quantitative measure of most likely subset length. In the above example (4 sets of length 4, one set of length 1), it is "humanly" evident the next set will be 4 in length, but just making an average yields length of 3.4.

What would be best method to get "the answer closer to 4" ?

2 Answers 2

3

It's not evident to me that the next group will have length 4. What if the next elements are "BAAAA" again? It happened once, so why not again?

If you want the "most likely" length, then choose the mode, which is exactly what you said: the most likely length seen so far, in this case 4.

2

Why do you have to show this in a single number? Why not just plot a histogram of the lengths? Then you can see that the mode is 4 and you will be able to see any other commonly occurring string lengths as well. Certainly as Mark suggests the mode will give you 4 and will always be an integer.