15
$\begingroup$

From Wikipedia

Let $(M, d)$ be a metric space with its Borel sigma algebra $\mathcal{B} (M)$. Let $\mathcal{P} (M)$ denote the collection of all probability measures on the measurable space $(M, \mathcal{B} (M))$.

For a subset $A \subseteq M$, define the $ε$-neighborhood of $A$ by A^{\varepsilon} := \{ p \in M ~|~ \exists q \in A, \ d(p, q) < \varepsilon \} = \bigcup_{p \in A} B_{\varepsilon} (p). where $B_{\varepsilon} (p)$ is the open ball of radius $\varepsilon$ centered at $p$.

The Lévy–Prokhorov metric $\pi : \mathcal{P} (M)^{2} \to [0, + \infty)$ is defined by setting the distance between two probability measures $\mu$ and $\nu$ to be $ \pi (\mu, \nu) := \inf \left\{ \varepsilon > 0 ~|~ \mu(A) \leq \nu (A^{\varepsilon}) + \varepsilon \ \text{and} \ \nu (A) \leq \mu (A^{\varepsilon}) + \varepsilon \ \text{for all} \ A \in \mathcal{B}(M) \right\}. $

  1. I wonder what the purpose, motivation and intuition of the L-P metric are?
  2. Is the following alternative a reasonable metric or some generalized metric between measures $ \sup_{A \in \mathcal{B}(M)} |\mu(A) - \nu(A)|? $ If yes, is this one more simple and easy to understand and therefore maybe more useful than L-P metric?
  3. A related metric between distribution functions is the Levy metric:

    Let $F, G : \mathbb{R} \to [0, + \infty)$ be two cumulative distribution functions. Define the Lévy distance between them to be $ L(F, G) := \inf \{ \varepsilon > 0 | F(x - \varepsilon) - \varepsilon \leq G(x) \leq F(x + \varepsilon) + \varepsilon \mathrm{\,for\,all\,} x \in \mathbb{R} \}. $

    I wonder how to picture this intuition part:

    Intuitively, if between the graphs of $F$ and $G$ one inscribes squares with sides parallel to the coordinate axes (at points of discontinuity of a graph vertical segments are added), then the side-length of the largest such square is equal to $L(F, G)$.

Thanks and regards!

  • 1
    Just a sidenote: the metric you defined in #2 is the total variation distance.2016-01-18

3 Answers 3

7

Most of what occurs to me has already been said, but you may find the following picture useful.

If $d_C$ is the Chebyshev metric on $R^2$, i.e. with points $\mathbf{p} = (x_1,y_1)$ and $\mathbf{q} = (x_2,y_2)$ in $R^2$,

$d_C(\mathbf{p,q}) := |x_1-x_2| \vee |y_1-y_2|$,

and $h_C$ is the Hausdorff metric on closed subsets of $R^2$ induced by $d_C$, i.e. with $A$ and $B$ being closed subsets of $R^2$,

$h_C(A,B):= \sup_{\mathbf{p} \in A} d_C(\mathbf{p},B) \vee \sup_{\mathbf{q} \in B} d_C(\mathbf{q},A)$,

where as usual $d_C(\mathbf{p},B) = \inf_{\mathbf{r} \in B} d_C(\mathbf{p,r})$ etc,

then the Levy metric between two distribution functions $F$ and $G$ is simply the Hausdorff distance $d_C$ between the closures of the completed graphs of $F$ and $G$.

  • 0
    Hi zab, I have opened a new post about my comments above http://math.stackexchange.com/questions/313465/what-does-the-completed-graph-of-a-function-mean2013-02-25
5
  1. The Levy-Prokhorov metric does metrizises the weak convergence of measures. That is a quite cool thing since it allows you to conclude from the fact that $\mu_n$ weakly converges to $\mu$ that it "approaches $\mu$ with respect to some distance".

  2. I think this one is more related to the variation norm of the distance of the measures and hence, describes something like norm-convergence. Together with the first answer you see that both approaches are for completely different purposes.

One intuition I have for the Levy-Prokhorov metric is that two point-masses $\delta_x$ and $\delta_y$ have the distance of their points if the points are not too far away, i.e. for $d(x,y)\leq 1$ it holds that $\pi(\delta_x,\delta_y) = d(x,y).$ One the other hand, your term in 2. is always 2, regardless of $x$ and $y$. If you have a sequence $(x_n)$ converging to $x$ in $M$, then $(\delta_{x_n})$ converges to $\delta_x$ with respect to Levy-Prokhorov but not in norm.

  • 0
    Oh yes, you are right!2012-04-23
5

I think you get a complete picture of the Prokhorov metric $\pi$ by combining what Dirk has pointed out and what we already know about the total variation metric. Essentially, $\pi$ is a measure-theoretic analogue of the Hausdorff metric, but loosened up modulo the total variation metric. I will explain what I mean by this.

Suppose we have a probability measure $\mu$. We can imagine two different types (I will call them Type I, Type II) of ways of of slightly changing the measure $\mu$. To make exposition simple, let's suppose $\mu$ is just a pile of N point masses, i.e., $\mu = \frac1N \sum_{i=1}^N \delta_{x_i}$ where $x_1, x_2, \cdots, x_N$ are $N$ points in space and $\delta_x$ denotes the point mass at $x$, i.e., the Dirac measure. A type I change is when you cut out a tiny chunk of $\mu$ and then move that chunk arbitrarily. To be precise, we will say that the new probability measure $\nu$ is obtained by a type I change from $\mu$ within $\epsilon >0$ if we have $y_1, \cdots, y_N$ (another list of N points) such that $\nu := \frac1N \sum_{i=1}^N \delta_{y_i}$ and $ \#\{1 \le i \le N: x_i \ne y_i \} \le \epsilon N $

An essential property of the total variation metric $\delta(\mu, \nu)$ (between probability measures) is that it allows changes of type I. In other words, we have a constant C such that $\delta(\mu, \nu) \le C \epsilon$ whenever $\nu$ is obtained from $\mu$ by type I change within $\epsilon$.

A type II change is when you move all or some of the particles within small distance individually. To be precise, the definition for type II change replaces the condition $ \#\{1 \le i \le N: x_i \ne y_i \} \le \epsilon N $ with this condition $ d(x_i, y_i) < \epsilon \ \forall 1 \le i \le N $

The Hausdorff metric allows changes of type II in the following sense: there is a constant C such that whenever $x_1, \cdots, x_n$ and $y_1, \cdots, y_n$ are two lists of $N$ points in space such that the above condition holds, the Hausdorff distance between the two sets $\{x_i : 1 \le i \le N\}, \{y_i : 1 \le i \le N\}$ is $\le C\epsilon$.

The Prokhorov metric $\pi$ allows both type I and type II. In fact, you should be able to prove the following fact. $ \#\{1 \le i \le N: d(x_i, y_i) \ge \epsilon_2 \} \le \epsilon_1 N \implies \nu(A) \le \mu(A^{\epsilon_2}) + \epsilon_1 \ \forall A$ This is just a type I change within $\epsilon_1$ followed by a type II change within $\epsilon_2$. So the Prokhorov metric is simply what you would have come up with if you tried to define a metric $\pi$ with the nice property that $\pi(\mu, \nu) \le \epsilon$ whenever $\nu$ is obtained by moving particles in a $1-\epsilon$ portion of a ``pile of dirt of unit mass'' $\mu$ in any way within distance $\epsilon$ and the rest $\epsilon$ portion of $\mu$ arbitrarily.

Of course we can think of another metric $\pi'$ that satisfies this nice property simply by definition.

$ \pi'(\mu, \nu) := \inf_{\gamma \in \Gamma(\mu, \nu)} \kappa(\gamma) $

where $\Gamma(\mu,\nu)$ means the set of all couplings between $\mu, \nu$ and

$ \kappa(\gamma) := \inf\{ \epsilon: \gamma\{ (x,y): d(x, y) > \epsilon \} < \epsilon \} $

The nice property of the Prokhorov metric can then be re-expressed as $\pi \le \pi'$ since $\Gamma(\mu,\nu)$ can be thought of as the collection of all possible ways of moving the pile of dirt $\mu$ into the new pile of dirt $\nu$. Less obvious is the fact that the other inequality $\pi \ge \pi'$ also holds. So in the end, these aren't really two metrics $\pi$ and $\pi'$, they are one same metric $\pi = \pi'$.

The Prokhorov metric reduces to the total variation metric when the discrete metric is assigned to the space $M$. So another way of thinking of $\pi$ is that it is a generalization of the total variation metric that takes the topology of the space into account.