0
$\begingroup$

I have been reading an implementation of the KNN algorithm to determine what is the probability that the price of an item A with certain attributes is between X and Y dollars.

In order to find such a distribution, we use a training set which contains some attributes (age, ranking, etc) and a price. We take those attributes (that is, everything except the price) and compute a distance (good old euclidean metric) between each item in our training set and the item A we are interested in. We use this set of distances as input for a gaussian distribution to get a weight for each element (in such a way that elements which are nearer of item A are considered more important than items far away).

Finally, we calculate the probability in the following way:

$P(X \leq \text{Price} \leq Y) = \displaystyle \frac{\sum \text{weights of items with price between X and Y}}{\sum \text{all weights}}$

where $\sum$ is performed for the nearest k items to item A.

Hopefully, that makes sense.

Question: We are using price as a variable in the probability function but we use weights to calculate such a probability. Why?

  • 0
    Might go better in stats.stackexchange.com?2011-08-19

1 Answers 1