9
$\begingroup$

Let $(S,d)$ be a metric space and let $\mathcal P(S)$ denote the space of Borel probability measures on $S$ endowed with the Prokhorov metric $\pi:\mathcal P(S)\times \mathcal P(S)\to \mathbb R_+$ given by $ \pi(P,Q):=\inf\{\varepsilon\geq 0:P(F)\leq Q(F^\varepsilon)+\varepsilon \text{ for all closed } F\subset S\} $ where the $\varepsilon$-inflation of a set is given by $ F^{\varepsilon} = \{x\in S:d(x,F)<\varepsilon\}.$

Another useful metric on $\mathcal P(S)$ is induced by the total variation norm, i.e. $ \rho(P,Q):=\sup\limits_{A\in \mathfrak B(S)}|P(A) - Q(A)| $ where $\mathfrak B(S)$ is the Borel $\sigma$-algebra on $(S,d)$. I wonder if there are any interesting relations between these two metrics, $\pi$ and $\rho$. In particular, I know that convergence in $\rho$ implies the weak convergence and hence if $S$ is separable than it implies the convergence in $\pi$.

I wonder, however, if under some additional assumptions it is possible to derive some non-trivial bounds on $\rho$ if I know upper bounds on $\pi$. Or at least, if it is possible to upper-bound $|P(F) - Q(F)|$ for closed $F$ using $\pi$.

3 Answers 3

11

On metric spaces, the Prokhorov metric can always be bounded by the total variation metric. If $S$ is finite, you can bound the total variation metric by the Prokhorov metric via the Wasserstein metric.

More metrics and their relations (including the ones above) are nicely summariced in Gibbs A.L. and Su F.E., On Choosing and Bounding Probability Metrics, International Statistical Review 70 (2002), pp. 419-435.

  • 0
    I also misread the direction of the arrows, thanks a lot!2012-11-28
4

Generally, knowing things about $\pi$ cannot tell you much about $\rho$; weak convergence is much easier to achieve than total variation convergence.

For example, if $S$ is any non-discrete space, $x$ is a limit point, and $x_n \to x$ is a sequence with $x_n \ne x$ for all $n$, then the point masses $\delta_{x_n}$ satisfy $\pi(\delta_{x_n}, \delta_x) \to 0$. (Indeed, $\pi(\delta_{x_n}, \delta_x) = d(x_n, x)$.) But we have $\rho(\delta_{x_n}, \delta_x) = 1$ for every $n$.

  • 0
    @S.D.: You're right, it should be 1. I fixed it.2012-11-28