1
$\begingroup$

So say I have three files $(A, B,C)$ that are filled with words.

Say for a specific word $W_{i i}$ want to determine the probability of the word given a file. that is

Calculate: $ P(W_i|\text{File})$ so for example say we wanted to determine (for file A) $P(W_i|A)$

There are two ways I've seen this which is the correct calculation?

  1. $P(W_i|A)=\frac{\text{Number Of Times }W_i\text{ Occurs In File }A}{\text{Number Of Words In File }A} $ If you didnt get that: Count up the number of times W exists in file A then divide it by the number of words in A.

  2. $P(W_i|A)=\frac{\text{Number Of Times }W_i\text{ Occurs In File }A}{\text{Sum of the Number Of Times }W_i\text{ Occurs In File }A, B, C} $

Count up the number of times W exists in file $A$ then divide it by the number of times $A$ occurs (number of times $W$ exists in $A$ plus number of times $W$ exits in $B$ .... etc)

  • 0
    The first, since for $P(W_i|A)$ we are restricting the domain to $A$.2012-05-25

0 Answers 0