We all know there are distance functions, like Kullback Leibler distance, Bhattacharyya measure, Euclidean distance, Wasserstein distance, and so on. Take a sample distance: $D=\sum\limits_n\left|P_n\left(\text{model}\right)-P_n\left(\text{sample}\right)\right|$. Specifically, if we have a model distribution (probability density function)$P\left(\text{model}\right)=[0.2,0.8]$, $n=2$, we want to calculate the distance between every sample distribution and this model distribution. If sample distribution is $P\left(\text{sample}1\right)=[0.3,0.7]$. Then $D=\left|0.3-0.2\right|+\left|0.7-0.8\right|=0.2$. But I do not think it is a good distance measure since I think $0.8$ and $0.7$ are more similar than $0.2$ and $0.3$. What I mean is in model distribution, one component is 0.8, another is 0.2, same difference $a$ has more influence in 0.2 since 0.8 is quite big than 0.2. So the weight of components are different. How should I incorporate weight into distance equation? I try to make a equation , like $D=\sum\limits_n\frac{\left|P_n\left(\text{model}\right)-P_n\left(\text{sample}\right)\right|}{P_n\left(\text{model}\right)}$, But it seems to be wrong.
which is the best distance function?
4
$\begingroup$
probability-distributions
discrete-mathematics
-
5I, personally, do not know there are distance functions like Kullback Leibler distance, Bhattacharyya distance, etc., etc. – 2011-08-02
-
0Somewhat related... but more of an interesting (imo) aside, http://en.wikipedia.org/wiki/Logarithm#Psychology "Psychological studies found that mathematically unsophisticated individuals tend to estimate quantities logarithmically" – 2011-08-02
-
0@ Gerry Myerson,hi, Thank you for your comment. I add two links about these two terms. – 2011-08-02
-
0What the best distance is, will largely depend on the application you need it for. – 2011-08-02
-
0@Gerry Are you saying you do not know about these names, or are you pointing out that they are not really distances? :-) (BTW neither KL-distance nor the Bhattacharyya distance are true distances.) – 2011-08-02
-
7I am pointing out that we don't all know these things, since I am a part of "we" and I never heard of any of them (except Euclidean distance). – 2011-08-02
-
0@Tyler Bailey ,thank you, I will check it out and give some feedback. – 2011-08-02
-
0@Raskolnikov, thank you for you suggestion. I agree with you. But in some occasion, we can not know what it is exactly, like what distribution it is( I am just familiar with normal distribution). What I mean it, in the real word, what we need to handle is discrete, especially in computer. For example, the people who dress different cloth, so we can not say the people meet the law of normal distribution or another distribution. Maybe I am wrong, so please correct me. – 2011-08-02
-
0It's true that you can not know the distribution beforehand. But many statistical tests have been developed with that in mind. Especially the category of non-parametric tests. Measures like the one used in the Kolmogorov-Smirnov test might be what you are looking after. – 2011-08-02
-
0Maybe you should take your question to [Cross Validated](http://stats.stackexchange.com). – 2011-08-02
-
0@Raskolnikov ,Thank you, I just want to find one which suits everys distribution, I mean it will get good result now matter what the distribution is. – 2011-08-02
-
0@Tyler Bailey:Are [they](http://en.wikipedia.org/wiki/Prime_number_theorem#History_of_the_asymptotic_law_of_distribution_of_prime_numbers_and_its_proof) all 'athematically unsophisticated individuals'. – 2011-11-30
1 Answers
1
Try with
$$D=\sum\limits_n \frac{\left(P_n \left(\text{model}\right)-P_n \left(\text{sample}\right)\right)^2}{P_n \left(\text{model}\right)} \; .$$
This is used in many statistical tests and is known to be approximately $\chi^2$-distributed.
-
0I tried this one. Get a one [websit](http://helpful.knobs-dials.com/index.php/Similarity/distance_measures#Distribution_comparisons), I will check it. I hopt weight problem can be solved. – 2011-08-02