3
$\begingroup$

So I am trying to see how the Kendall $\tau$ distance is considered a metric; i.e. that it satisfies the triangle inequality.

The Kendall $\tau$ distance is defined as follows:

$K(\tau_1,\tau_2) = |(i,j): i < j, ( \tau_1(i) < \tau_1(j) \land \tau_2(i) > \tau_2(j) ) \lor ( \tau_1(i) > \tau_1(j) \land \tau_2(i) < \tau_2(j) )|$

Thank you in advance.

3 Answers 3

3

Consider three lists $\tau_1,\tau_2$ and $\tau_3$. Then we want to show that $K(\tau_1, \tau_3) \leq K(\tau_1, \tau_2) + K(\tau_2, \tau_3)$. The number of disagreements between $\tau_1$ and $\tau_3$ could be independent of the number of disagreements between $\tau_1$ and $\tau_2$ and $\tau_2$ and $\tau_3$. For example, consider $\tau_1$: height, $\tau_2$: weight and $\tau_3$: hair count. The number of disagreements between $\tau_1$ and $\tau_3$ would be no greater than the number of disagreements between $\tau_1$ and $\tau_2$ and $\tau_2$ and $\tau_3$.

  • 0
    @Sasha: Yes, corrected.2011-12-12
1

Kendall tau rank distance is a metric only if you compare ranking of the elements.

If you perform Kendall function comparing elements you will find cases where the triangular inequality does not work.

Example:

0    0    0   10   10   10 

and

5 5 5 0 0 0

scores 9 (using Kendall comparing elements)

While

0    0    0   10   10   10 

and

5 3 5 7 5 2

scores 3

and

5 3 5 7 5 2

and

5 5 5 0 0 0

scores 4

so 9 > 3 + 4. So triangular inequality is not working here.

But if you operate with the sorting position of each element in it vector (aka ranking) triangular inequality is gauaranteed. This happens beacuase of repetitions of elements within vectors.

We should call "Kendall tau ranking distance" to one of the algorithms and "Kendall tau distance" to the other

Best

Mijael

1

Given two rankings $r_1,r_2$, define $S(r_1,r_2)$ as the set of pairs $\{i,j\}$ that are ranked the same by $r_1$ and $r_2$. If a pair is ranked the same by $r_1$ and $r_2$ and also by $r_2$ and $r_3$, then it is also ranked the same by $r_1$ and $r_3$. Therefore:

$S(r_1,r_3)\supseteq S(r_1,r_2)\cap S(r_2,r_3).$

Let's denote by $N$ the total number of pairs (it is ${n\choose 2}$, where $n$ is the number of items). Then:

$|S(r_1,r_2)\cap S(r_2,r_3)| \geq |S(r_1,r_2)| + |S(r_2,r_3)| - N.$

By definition, $K(r_1,r_2) = N-|S(r_1,r_2)|$. Hence:

$K(r_1,r_3) = N-|S(r_1,r_3)| \leq N - |S(r_1,r_2)| - |S(r_2,r_3)| + N \leq K(r_1,r_2)+K(r_2,r_3).~~~ \square$