1
$\begingroup$

Let's say I want to determine if some person is bald using two features, age and hair color. Also I will assume that age and hair color is independent in other words use Naive Bayes classifier.

Transforming my given data to probability table:

enter image description here

If I wanted to calculate if person is bald at age 20 having brown hair it would be easy

p(bald=yes|20,brown)=1/4*1/4*4/9=0.02

p(bald=no|20,brown)=2/5*4/5*5/9=0.17

Since first probability is higher it will more likely will be bold. But what to do if I wanted to calculate probability of being bold at age 20 and having blonde hair?

p(bald=yes|20,black)=1/4*2/4*4/9=0.05

p(bald=no|20,black)=2/5*0/5*5/9=0

I don't have any data of man being bald when he has blonde hair and I think it wouldn't be very correct just because of this ignore everything. So how I should deal with this situation in general where we would have much more features and much more data?

2 Answers 2

1

You should try and add a 1 to your "blonde" column. This is to ensure that you have some non-zero number when you compute the probability. For example 4/5 becomes (4)/(5+1), 1/5 -> (1)/(5+1) and 0/5 becomes 1/(5+1). Basically we just introduced a phantom blonde data point just so that the probability is non-zero and small.

Here is a write up describing the methodology

1

The common idea used here is Laplace smoothing a.k.a additive smoothing. You can find a brief explanation here.