I implemented naive bayes algorithm to predict an emotion ( happy , sad ) for blogs using the formula provided by Manning's Information Retrieval book
http://nlp.stanford.edu/IR-book/pdf/13bayes.pdf
essentially for a given document, it comes down to comparing
p(word = a,b,c | label = bad) p(label = bad) vs. p(word = a,b,c | label = good) p (label = good)
now I wonder since we have all the counts of the words for each emotion,
Can we reframe the problem such as this:
p(label = bad | word = a) * p(label = bad | word = b) * p(label = bad | word = c) vs. p(label = good | word = a) * p(label = good | word = b) * p(label = good | word = c)
- Does this computation make sense? This is a more general question... how do you evaluate the correctness of this computation? If so, what is the implication of this model vs the formula found in the textbook? for example, are there different claims about independence? With naive bayes, the assumption is that the words are "conditionally" independent.
- It seems to me that the textbook model is where you compare 2 factories (happy, sad) and you compare the likelihood of making that string of words from the 2 factories as opposed to say rolling several dies and using each die's signal of being good or bad
fyi I took one basic probability course