15
$\begingroup$

From Wikipedia

Let $(M, d)$ be a metric space with its Borel sigma algebra $\mathcal{B} (M)$. Let $\mathcal{P} (M)$ denote the collection of all probability measures on the measurable space $(M, \mathcal{B} (M))$.

For a subset $A \subseteq M$, define the $ε$-neighborhood of $A$ by $$ A^{\varepsilon} := \{ p \in M ~|~ \exists q \in A, \ d(p, q) < \varepsilon \} = \bigcup_{p \in A} B_{\varepsilon} (p). $$ where $B_{\varepsilon} (p)$ is the open ball of radius $\varepsilon$ centered at $p$.

The Lévy–Prokhorov metric $\pi : \mathcal{P} (M)^{2} \to [0, + \infty)$ is defined by setting the distance between two probability measures $\mu$ and $\nu$ to be $$ \pi (\mu, \nu) := \inf \left\{ \varepsilon > 0 ~|~ \mu(A) \leq \nu (A^{\varepsilon}) + \varepsilon \ \text{and} \ \nu (A) \leq \mu (A^{\varepsilon}) + \varepsilon \ \text{for all} \ A \in \mathcal{B}(M) \right\}. $$

  1. I wonder what the purpose, motivation and intuition of the L-P metric are?
  2. Is the following alternative a reasonable metric or some generalized metric between measures $$ \sup_{A \in \mathcal{B}(M)} |\mu(A) - \nu(A)|? $$ If yes, is this one more simple and easy to understand and therefore maybe more useful than L-P metric?
  3. A related metric between distribution functions is the Levy metric:

    Let $F, G : \mathbb{R} \to [0, + \infty)$ be two cumulative distribution functions. Define the Lévy distance between them to be $$ L(F, G) := \inf \{ \varepsilon > 0 | F(x - \varepsilon) - \varepsilon \leq G(x) \leq F(x + \varepsilon) + \varepsilon \mathrm{\,for\,all\,} x \in \mathbb{R} \}. $$

    I wonder how to picture this intuition part:

    Intuitively, if between the graphs of $F$ and $G$ one inscribes squares with sides parallel to the coordinate axes (at points of discontinuity of a graph vertical segments are added), then the side-length of the largest such square is equal to $L(F, G)$.

Thanks and regards!

  • 1
    Atleast one motivation for the Prokhorov metric is the metrization of weak convergence of measures. This topology is very used and fruitful for applications. It is not however the only useful metric on $P(M)$ that metrizises weak convergence. Wasserstein metrics, that arise from optimal transportation of measures, also metrizise weak convergence if $M$ is compact, or Polish and $d$ bounded.2012-04-19
  • 0
    @ThomasE.: Thanks! By Weak convergence, do you mean [this link](http://en.wikipedia.org/wiki/Weak_convergence_of_measures#Weak_convergence_of_measures)?2012-04-19
  • 0
    Yeah, exactly. The first equivalent expression from the list is usually being used as a definition for the case of probability measures. For arbitrary measures the test functions need to have a compact support as well. I.e. $(\mu_{k})$ converges weakly to $\mu$ if $\int_{S}fd\mu_{k}\to \int_{S}fd\mu$ for all continuous, compactly supported $f:S\to \mathbb{R}$.2012-04-19
  • 0
    I have developed my answer somewhat here: http://math.stackexchange.com/a/358095/488902013-04-18
  • 0
    One can write $L(F,G)$ slightly differently as $$L(F,G) = \inf\{\epsilon > 0| F(x) \leq G(x+\epsilon) + \epsilon \text{ and } G(x) \leq F(x+\epsilon) + \epsilon, \forall x\in \mathbb{R}\}$$ so that more clearly L-P metric is a generalization of Lévy's metric2015-04-06
  • 1
    Just a sidenote: the metric you defined in #2 is the total variation distance.2016-01-18

3 Answers 3

7

Most of what occurs to me has already been said, but you may find the following picture useful.

If $d_C$ is the Chebyshev metric on $R^2$, i.e. with points $\mathbf{p} = (x_1,y_1)$ and $\mathbf{q} = (x_2,y_2)$ in $R^2$,

$d_C(\mathbf{p,q}) := |x_1-x_2| \vee |y_1-y_2|$,

and $h_C$ is the Hausdorff metric on closed subsets of $R^2$ induced by $d_C$, i.e. with $A$ and $B$ being closed subsets of $R^2$,

$h_C(A,B):= \sup_{\mathbf{p} \in A} d_C(\mathbf{p},B) \vee \sup_{\mathbf{q} \in B} d_C(\mathbf{q},A)$,

where as usual $d_C(\mathbf{p},B) = \inf_{\mathbf{r} \in B} d_C(\mathbf{p,r})$ etc,

then the Levy metric between two distribution functions $F$ and $G$ is simply the Hausdorff distance $d_C$ between the closures of the completed graphs of $F$ and $G$.

  • 0
    +1 Thanks! What does the completed graph of a function F mean?2012-12-10
  • 0
    Can the Levy-Prokhorov metric be interpreted in terms of or similar to Hausdorff metric in some ways?2012-12-10
  • 0
    Hi zab, I have opened a new post about my comments above http://math.stackexchange.com/questions/313465/what-does-the-completed-graph-of-a-function-mean2013-02-25
5
  1. The Levy-Prokhorov metric does metrizises the weak convergence of measures. That is a quite cool thing since it allows you to conclude from the fact that $\mu_n$ weakly converges to $\mu$ that it "approaches $\mu$ with respect to some distance".

  2. I think this one is more related to the variation norm of the distance of the measures and hence, describes something like norm-convergence. Together with the first answer you see that both approaches are for completely different purposes.

One intuition I have for the Levy-Prokhorov metric is that two point-masses $\delta_x$ and $\delta_y$ have the distance of their points if the points are not too far away, i.e. for $d(x,y)\leq 1$ it holds that $$\pi(\delta_x,\delta_y) = d(x,y).$$ One the other hand, your term in 2. is always 2, regardless of $x$ and $y$. If you have a sequence $(x_n)$ converging to $x$ in $M$, then $(\delta_{x_n})$ converges to $\delta_x$ with respect to Levy-Prokhorov but not in norm.

  • 0
    +1. Thanks, Dirk! I wonder about the intuition of L-P metric in general?2012-04-19
  • 0
    @Dirk When you say $\pi(\delta_{x}, \delta_{y}) = d(x,y)$, are you assuming that $d(x,y)\leq 1$? Otherwise, shouldn't $\pi(\delta_{x}, \delta_{y})$ be the minimum of $d(x,y)$ and $1$?2012-04-23
  • 0
    Oh yes, you are right!2012-04-23
3

I think you get a complete picture of the Prokhorov metric $\pi$ by combining what Dirk has pointed out and what we already know about the total variation metric. Essentially, $\pi$ is a measure-theoretic analogue of the Hausdorff metric, but loosened up modulo the total variation metric. I will explain what I mean by this.

Suppose we have a probability measure $\mu$. We can imagine two different types (I will call them Type I, Type II) of ways of of slightly changing the measure $\mu$. To make exposition simple, let's suppose $\mu$ is just a pile of N point masses, i.e., $$\mu = \frac1N \sum_{i=1}^N \delta_{x_i}$$ where $x_1, x_2, \cdots, x_N$ are $N$ points in space and $\delta_x$ denotes the point mass at $x$, i.e., the Dirac measure. A type I change is when you cut out a tiny chunk of $\mu$ and then move that chunk arbitrarily. To be precise, we will say that the new probability measure $\nu$ is obtained by a type I change from $\mu$ within $\epsilon >0$ if we have $y_1, \cdots, y_N$ (another list of N points) such that $$\nu := \frac1N \sum_{i=1}^N \delta_{y_i}$$ and $$ \#\{1 \le i \le N: x_i \ne y_i \} \le \epsilon N $$

An essential property of the total variation metric $\delta(\mu, \nu)$ (between probability measures) is that it allows changes of type I. In other words, we have a constant C such that $\delta(\mu, \nu) \le C \epsilon$ whenever $\nu$ is obtained from $\mu$ by type I change within $\epsilon$.

A type II change is when you move all or some of the particles within small distance individually. To be precise, the definition for type II change replaces the condition $$ \#\{1 \le i \le N: x_i \ne y_i \} \le \epsilon N $$ with this condition $$ d(x_i, y_i) < \epsilon \ \forall 1 \le i \le N $$

The Hausdorff metric allows changes of type II in the following sense: there is a constant C such that whenever $x_1, \cdots, x_n$ and $y_1, \cdots, y_n$ are two lists of $N$ points in space such that the above condition holds, the Hausdorff distance between the two sets $\{x_i : 1 \le i \le N\}, \{y_i : 1 \le i \le N\}$ is $\le C\epsilon$.

The Prokhorov metric $\pi$ allows both type I and type II. In fact, you should be able to prove the following fact. $$ \#\{1 \le i \le N: d(x_i, y_i) \ge \epsilon_2 \} \le \epsilon_1 N \implies \nu(A) \le \mu(A^{\epsilon_2}) + \epsilon_1 \ \forall A$$ This is just a type I change within $\epsilon_1$ followed by a type II change within $\epsilon_2$. So the Prokhorov metric is simply what you would have come up with if you tried to define a metric $\pi$ with the nice property that $\pi(\mu, \nu) \le \epsilon$ whenever $\nu$ is obtained by moving particles in a $1-\epsilon$ portion of a ``pile of dirt of unit mass'' $\mu$ in any way within distance $\epsilon$ and the rest $\epsilon$ portion of $\mu$ arbitrarily.

Of course we can think of another metric $\pi'$ that satisfies this nice property simply by definition.

$$ \pi'(\mu, \nu) := \inf_{\gamma \in \Gamma(\mu, \nu)} \kappa(\gamma) $$

where $\Gamma(\mu,\nu)$ means the set of all couplings between $\mu, \nu$ and

$$ \kappa(\gamma) := \inf\{ \epsilon: \gamma\{ (x,y): d(x, y) > \epsilon \} < \epsilon \} $$

The nice property of the Prokhorov metric can then be re-expressed as $\pi \le \pi'$ since $\Gamma(\mu,\nu)$ can be thought of as the collection of all possible ways of moving the pile of dirt $\mu$ into the new pile of dirt $\nu$. Less obvious is the fact that the other inequality $\pi \ge \pi'$ also holds. So in the end, these aren't really two metrics $\pi$ and $\pi'$, they are one same metric $\pi = \pi'$.

The Prokhorov metric reduces to the total variation metric when the discrete metric is assigned to the space $M$. So another way of thinking of $\pi$ is that it is a generalization of the total variation metric that takes the topology of the space into account.