1
$\begingroup$

If I have a sequence which is comprised of one of 10 prefixes, one of 5 suffixes and a variable length middle, how do I compute the entropy of the sequence?

Using Shannon-Entropy $Hs= -\Sigma_{i=1}^{m} p_i ln(p_i)$

I can compute the entropy contributions from the 10 starting and 5 ending sequences using this sum, but I am unsure how to computer the variable length section. If the distribution of length is:

  • 0=0.5
  • 1=0.25
  • 2=0.2
  • 3=0.05

And each character can be A-F say, for 16 possibility with equal probability how do I calculate the number of bits of entropy? If I simply use the sum above, I am underestimating the contribution from the longer sequences, which have more possibilities. Do I also need to ignore the $p0$ term, as $0.5ln(0.5)$ isn't actually adding any additional characters to the sequence.

1 Answers 1