6
$\begingroup$

I have some sets of numbers, and I'd like to have a way to talk about how close these sets are to each other. I'm not sure what properties it should have (e.g. does it need to be a metric?). But I hope there are examples of better defined problems like this out there with some good solutions.

Some considerations. {1} should be closer to {1.1} than it is to {2}. And {.9,1.1} should be closer to {1} than {0,2} is. But is {1,1.1} closer to {1} than {2}? I suspect any distance function will have at least one parameter to control the tradeoff between nearness of points and cardinality of the sets.

  • 0
    @pythonic: I suppose I should have said "uniform *measures*" instead of "uniform *probability distributions*", as they shouldn't sum to unity. If you consider $\mu_1(1) = 1$ and $\mu_2(1) = 1$, $\mu_2(1+\epsilon) = 1$, then the ([modified](http://en.wikipedia.org/wiki/Earth_mover%27s_distance#Extensions)) EMD is $0$ (move $1$ unit of mass from position $1$ to position $1$) plus $\sigma$ (delete $1$ unit of mass from position $1+\epsilon$). This $\sigma$ acts like a tradeoff parameter.2011-09-29

2 Answers 2

6

Let $A$ be a finite subset of $\mathbb{R}$. For $x\in\mathbb{R}$ define $d(x,A)$, the distance from $x$ to $A$, to be $d(x,A) = \min\{\vert x-a\vert:a\in A\},$ the distance from $x$ to the closest point of $A$. If $B$ is another finite subset of $\mathbb{R}$, define $d^*(A,B)$, the asymmetric distance from $A$ to $B$, to be $d^*(A,B) = \max\{d(a,B):a\in A\}.$ Finally, define $d(A,B)$, the distance between $A$ and $B$, to be $d(A,B) = \max\{d^*(A,B),d^*(B,A)\}.$ This is the Hausdorff distance mentioned by deinst.

A few examples: $\begin{align*}d(\{0.9,1.1\},\{1\}) &= \max\{0.1,0.1\} = 0.1\\ d(\{0.9,1.1\},\{2\}) &= \max\{1.1,0.9\} = 1.1\\ d(\{1,1.1\},\{1\}) &= \max\{0.1,0\} = 0.1\\ d(\{1,1.1\},\{2\}) &= \max\{1,0.9\} = 1\\ d(\{0.9,1.1\},\{1.9,2.1\}) &= \max\{1,1\} = 1\\ d(\{0.9,1.1\},\{1.9,2.0,2.1\}) &= \max\{1,1\} = 1\\ d(\{1.9,2.0,2.1\},\{1.9,2.1\}) &= \max\{0.1,0\} = 0.1 \end{align*}$

3

There are multiple distances which you might consider in principle. Given that you're considering finite sets, it makes sense to reduce to metrics popular on real vector spaces.

Let $S$ and $T$ be finite sets of real numbers. In the case that $S$ and $T$ have the same cardinality $n$, let $\mathbf s, \mathbf t \in \mathbb R^n$ be vectors whose coefficients are elements of $S$ and $T$ respectively (in any order). Then, for any positive integer $p$ (and for $p = \infty$), you may consider a $p$-distance defined by $ \min_\Pi\; \| \Pi \mathbf s - \mathbf t \|_p$ where $\Pi$ ranges over all permutation matrices on $\mathbb R^n$, and where $\| \mathbf v \|_p := \sqrt[\Big.^{\scriptstyle p}]{\sum_{j=1}^n |v_j|^p\;}$ (taking the limit as $p \to \infty$ for the norm $\| \mathbf v \|_\infty$). If $S$ and $T$ have different cardinalities, you may vary this (in the case that $S$ is larger) by replacing $\Pi$ with an arbitrary matrix whose columns are distinct standard basis vectors (selecting some subset of $S$ to perform the distance computation with).

Regardless of the dimensions, you could also generalize this to consider a distance of the form $ \min_{\Pi_S, \Pi_T}\; \bigl\| \Pi_{\!S} \;\mathbf s \;-\; \Pi_{\!T} \;\mathbf t \bigr\|_p$ where $\Pi_S$ and $\Pi_T$ range over matrices whose columns are standard basis vectors, possibly with repetition; this allows you to compute vector metrics using arbitrary pairs of elements of $S$ and $T$.

Note that as the cardinalities of the sets grow, these quantities are likely to be prohibitive to compute; but the sets of vectors which I describe using these permutation/selection matrices will describe nice convex sets, for which the usual vector metrics make a certain amount of sense, and which may prove amenable to analysis. And if you consider instead the limit $p \to -\infty$, you should encounter the minimum-pair-distance norm described by Brian in his earlier post.

  • 0
    I've edited the TeX to be simpler, because slight suboptimality on my browser is trumped by bad display in multiple browsers.2011-09-28