If I have a sequence which is comprised of one of $10$ prefixes, one of $5$ suffixes and a variable length middle, how do I compute the entropy of the sequence?
Using Shannon-Entropy $$H= -\sum_{i=1}^{m} p_i \ln(p_i)$$
I can compute the entropy contributions from the $10$ starting and $5$ ending sequences using this sum, but I am unsure how to computer the variable length section. If the distribution of length is:
- $0=0.5$
- $1=0.25$
- $2=0.2$
- $3=0.05$
And each character can be A-F say, for $16$ possibility with equal probability how do I calculate the number of bits of entropy? If I simply use the sum above, I am underestimating the contribution from the longer sequences, which have more possibilities. Do I also need to ignore the $p_0$ term, as $0.5\ln(0.5)$ isn't actually adding any additional characters to the sequence.