0
$\begingroup$

I'm reading Shannon's article A Mathematical Theory of Communication, and I'm stuck at the telegraphy case example, on page 4.

Shannon writes a formula involving $N(t)$, the number of sequences of duration $t$. He says that

$ N(t)=N(t-2)+N(t-4)+N(t-5)+N(t-7)+N(t-8)+N(t-10). $

My question is: where does this formula arise from? I can't figure out the recursive relation.

(There are 4 possible signals for the channel: the dot, which uses 2 units of time; the dash, which uses 4 units of time; the letter space, 3 units of time; the word space, 6 units of time. No spaces follow each other)

3 Answers 3

2

Shannon's text reads:

(...) in telegraphy suppose the symbols are: (1) A dot, consisting of line closure for a unit of time and then line open for a unit of time; (2) A dash, consisting of three time units of closure and one unit open; (3) A letter space consisting of, say, three units of line open; (4) A word space of six units of line open. We might place the restriction on allowable sequences that no spaces follow each other (for if two letter spaces are adjacent, it is identical with a word space).

Writing d, t, q and s for a dot (two units), a letter space (three units), a dash (four units) and a word space (six units) respectively, this means that a sequence uses the alphabet d, t, q, s, but that the subwords tt, ts and st are forbidden.

Hence, any sequence of length $t$ ends with exactly one of the suffixes d, q, dt, qt, ds or qs, whose lengths are $2$, $4$, $5$, $7$, $8$ and $10$, respectively. The subsequence before this suffix may be any admissible sequence of length $t-2$, $t-4$, $t-5$, $t-7$, $t-8$ or $t-10$ respectively.

This yields the recurrence you quote.

1

Any string must end with one of the four characters, but if the final character is a space the previous one was not. (This means the recursion may not apply if $t = 3$ or $6$ as you may have a single space with nothing before it.)

The recursion is built up of :

$N(t−2)$: previous texts of length $t-2$ plus a dot

$N(t−4)$: previous texts of length $t-4$ plus a dash

$N(t−5)$: previous texts of length $t-4$ plus a dot then a letter space

$N(t−7)$: previous texts of length $t-7$ plus a dash then a letter space

$N(t−8)$: previous texts of length $t-8$ plus a dot then a word space

$N(t−10)$: previous texts of length $t-10$ plus a dash then a word space

1

As explained slightly earlier in that paper, we should consider not just the symbols but all "allowable sequences". For example, when counting we need to be aware of the fact that a letter space is never followed by a word space.

Instead of thinking of it as four symbols with rules, we think of it as six new symbols without rules. The six new symbols or "allowable sequences" are

  • a dot, corresponding to N(t-2)
  • a dash, corresponding to N(t-4)
  • a dot preceded by a letter space, N(t-5)
  • a dash preceded by a letter space, N(t-7)
  • a dot preceded by a word space, N(t-8)
  • a dash preceded by a word space, N(t-10)

Each of these six new symbols can be followed by any of the above six new symbols. So the recurrence relation is given by asking, "What was the last new symbol received?"