2
$\begingroup$

I implemented naive bayes algorithm to predict an emotion ( happy , sad ) for blogs using the formula provided by Manning's Information Retrieval book

http://nlp.stanford.edu/IR-book/pdf/13bayes.pdf

essentially for a given document, it comes down to comparing

p(word = a,b,c | label = bad) p(label = bad) vs. p(word = a,b,c | label = good) p (label = good)

now I wonder since we have all the counts of the words for each emotion,

Can we reframe the problem such as this:

p(label = bad | word = a) * p(label = bad | word = b) * p(label = bad | word = c) vs. p(label = good | word = a) * p(label = good | word = b) * p(label = good | word = c)

  • Does this computation make sense? This is a more general question... how do you evaluate the correctness of this computation? If so, what is the implication of this model vs the formula found in the textbook? for example, are there different claims about independence? With naive bayes, the assumption is that the words are "conditionally" independent.
  • It seems to me that the textbook model is where you compare 2 factories (happy, sad) and you compare the likelihood of making that string of words from the 2 factories as opposed to say rolling several dies and using each die's signal of being good or bad

fyi I took one basic probability course

  • 0
    Got something from my answer below?2012-07-25

1 Answers 1

1

One wants to compare the ratios $Q(\text{bad})/Q(\text{good})$ and $Q'(\text{bad})/Q'(\text{good})$, where $Q(\text{bad})$ and $Q(\text{good})$ are the quantities of interest in the first case and $Q'(\text{bad})$ and $Q'(\text{good})$ are their analogues in the second case.

The first thing to realize is that some hypothesis is lacking, which is probably the conditional independence of the words observed. That is, for example, $P(\text{words are}\ a,b,c\mid\text{label is}\ \ell)$ is $P(\text{word is}\ a\mid\text{label is}\ \ell)P(\text{word is}\ b\mid\text{label is}\ \ell)P(\text{word is}\ c\mid\text{label is}\ \ell),$ for $\ell=\text{good}$ and for $\ell=\text{bad}$. If this hypothesis is made, $ Q(\ell)=P(a\mid\ell)P(b\mid\ell)P(c\mid\ell)P(\ell)=P(a,\ell)P(b,\ell)P(c,\ell)P(\ell)^{-2}, $ while $ Q'(\ell)=P(a,\ell)P(b,\ell)P(c,\ell)P(a)^{-1}P(b)^{-1}P(c)^{-1}, $ for $\ell=\text{good}$ and for $\ell=\text{bad}$. Hence, $ \frac{Q(\text{bad})}{Q(\text{good})}=\frac{Q'(\text{bad})}{Q'(\text{good})}\cdot\frac{P(\text{good})^2}{P(\text{bad})^2}. $ To sum up, the ratios $Q(\text{bad})/Q(\text{good})$ and $Q'(\text{bad})/Q'(\text{good})$ always lead to the same prediction if and only if the a priori distribution of $\{\text{bad},\text{good}\}$ is uniform, that is, when $ P(\text{bad})=P(\text{good})=\tfrac12. $