1
$\begingroup$

So say I have three files $(A, B,C)$ that are filled with words.

Say for a specific word $W_{i i}$ want to determine the probability of the word given a file. that is

Calculate: $$ P(W_i|\text{File})$$ so for example say we wanted to determine (for file A) $$P(W_i|A)$$

There are two ways I've seen this which is the correct calculation?

  1. $$P(W_i|A)=\frac{\text{Number Of Times }W_i\text{ Occurs In File }A}{\text{Number Of Words In File }A} $$ If you didnt get that: Count up the number of times W exists in file A then divide it by the number of words in A.

  2. $$P(W_i|A)=\frac{\text{Number Of Times }W_i\text{ Occurs In File }A}{\text{Sum of the Number Of Times }W_i\text{ Occurs In File }A, B, C} $$

Count up the number of times W exists in file $A$ then divide it by the number of times $A$ occurs (number of times $W$ exists in $A$ plus number of times $W$ exits in $B$ .... etc)

  • 1
    The RHS of 2. (which is not what you explain on the line below) is P(A|W), not P(W|A).2012-05-25
  • 0
    The first, since for $P(W_i|A)$ we are restricting the domain to $A$.2012-05-25

0 Answers 0