We all know there are distance functions, like Kullback Leibler distance, Bhattacharyya measure, Euclidean distance, Wasserstein distance, and so on. Take a sample distance: $D=\sum\limits_n\left|P_n\left(\text{model}\right)-P_n\left(\text{sample}\right)\right|$. Specifically, if we have a model distribution (probability density function)$P\left(\text{model}\right)=[0.2,0.8]$, $n=2$, we want to calculate the distance between every sample distribution and this model distribution. If sample distribution is $P\left(\text{sample}1\right)=[0.3,0.7]$. Then $D=\left|0.3-0.2\right|+\left|0.7-0.8\right|=0.2$. But I do not think it is a good distance measure since I think $0.8$ and $0.7$ are more similar than $0.2$ and $0.3$. What I mean is in model distribution, one component is 0.8, another is 0.2, same difference $a$ has more influence in 0.2 since 0.8 is quite big than 0.2. So the weight of components are different. How should I incorporate weight into distance equation? I try to make a equation , like $D=\sum\limits_n\frac{\left|P_n\left(\text{model}\right)-P_n\left(\text{sample}\right)\right|}{P_n\left(\text{model}\right)}$, But it seems to be wrong.
which is the best distance function?
4
$\begingroup$
probability-distributions
discrete-mathematics
-
0@Tyler Bailey:Are [they](http://en.wikipedia.org/wiki/Prime_number_theorem#History_of_the_asymptotic_law_of_distribution_of_prime_numbers_and_its_proof) all 'athematically unsophisticated individuals'. – 2011-11-30
1 Answers
1
Try with
$D=\sum\limits_n \frac{\left(P_n \left(\text{model}\right)-P_n \left(\text{sample}\right)\right)^2}{P_n \left(\text{model}\right)} \; .$
This is used in many statistical tests and is known to be approximately $\chi^2$-distributed.
-
0I tried this one. Get a one [websit](http://helpful.knobs-dials.com/index.php/Similarity/distance_measures#Distribution_comparisons), I will check it. I hopt weight problem can be solved. – 2011-08-02