Given the following scenario from another post of mine where we are building a matrix that expresses the probability of first order transitions from one character to another in an english text.
We take a book, and count the number of times the letter 'e' occurs in that book -- say 15,000. Then we count the number of times the next letter is 'f' -- say, 200. With this in hand, we put
$M(\text{'e'}, \text{'f'}) = 200/15000 = 1.33\%$.
Say instead we want to normalize this conditional probability to a range between 0 - 1, but discluding the absolute values 0 or 1 (only getting infinitesimely close to each extreme). Is there an accepted way to use a sigmoid function for this sort of normalization of a probability?
I don't know if this is a common practice, however, I think it would be useful in an AI application I am working on.