31
$\begingroup$

Suppose $X$ is a real-valued random variable and let $P_X$ denote the distribution of $X$. Then $$ E(|X-c|) = \int_\mathbb{R} |x-c| dP_X(x). $$ The medians of $X$ are defined as any number $m \in \mathbb{R}$ such that $P(X \leq m) \geq \frac{1}{2}$ and $P(X \geq m) \geq \frac{1}{2}$.

Why do the medians solve $$ \min_{c \in \mathbb{R}} E(|X-c|) \, ? $$

4 Answers 4

26

For every real valued random variable $X$, $$ \mathrm E(|X-c|)=\int_{-\infty}^c\mathrm P(X\leqslant t)\,\mathrm dt+\int_c^{+\infty}\mathrm P(X\geqslant t)\,\mathrm dt $$ hence the function $u:c\mapsto \mathrm E(|X-c|)$ is differentiable almost everywhere and, where $u'(c)$ exists, $u'(c)=\mathrm P(X\leqslant c)-\mathrm P(X\geqslant c)$. Hence $u'(c)\leqslant0$ if $c$ is smaller than every median, $u'(c)=0$ if $c$ is a median, and $u'(c)\geqslant0$ if $c$ is greater than every median.

The formula for $\mathrm E(|X-c|)$ is the integrated version of the relations $$(x-y)^+=\int_y^{+\infty}[t\leqslant x]\,\mathrm dt$$ and $|x-c|=((-x)-(-c))^++(x-c)^+$, which yield, for every $x$ and $c$, $$ |x-c|=\int_{-\infty}^c[x\leqslant t]\,\mathrm dt+\int_c^{+\infty}[x\geqslant t]\,\mathrm dt $$

  • 0
    Thanks! (1) By "integrated version", is it to first integrate your last formula wrt $P$ over $\mathbb{R}$, then apply Fubini Thm to exchange the order of the two integrals, and so get the first formula? (2) If temporarily change the notation $P$ to represent the cdf of $X$, is Sivaram's Edit correct? Specifically, do those Riemann-Stieltjes integrals exist?2011-11-25
  • 0
    @Rasmus, you are right, thanks, misprint corrected.2011-11-25
  • 0
    Tim: Much too complicated... The last equation of my post states that $u(x)=v(x)$ for every $x$, for some functions $u$ and $v$. Then $E(u(X))=E(v(X))$, end of story. (And if you have questions on @Sivaram's post, ask Sivaram.)2011-11-25
  • 0
    Dear Did, I know the definition $$E(X)=\int X dP = \int \mathrm{id} d(X_* P) = \int x\cdot X_* P(dx)$$ for an integrable real random variable $X$. I don't understand how you can derive your first equation from this. Could you help me, please? Thank you.2016-05-10
  • 0
    @user8463524 This follows from the (deterministic) identity, valid for every real number $x$, $$|x-c|=\int_{-c}^{+\infty}\mathbf 1_{x\leqslant -t}\,\mathrm dt+\int_c^{+\infty}\mathbf 1_{x\geqslant t}\,\mathrm dt.$$2016-05-10
  • 2
    This is a very nice proof. A clarification for future readers, who, like me, would be perplexed by the notation $[x\leq-t]$ and $[x\geq t]$: If $A$ is an event, $[A]$ denotes the indicator function $\mathbb{1}_A$. In particular, $[x\leq-t] = \mathbb{1}_{\{x \leq -t\}}$, and likewise $[x\geq t] = \mathbb{1}_{\{x\geq t\}}$.2016-10-18
  • 0
    There's something I don't understand about the proof. You claim that the function $c\mapsto E(|X-c|)$ is everywhere differentiable and that its derivative at the point $c$ equals $P(X\leq c) - P(X\geq c)$. But isn't this the case only for those $c$ where both $c\mapsto P(X\leq c)$ and $c\mapsto P(X\geq c)$ are continuous? But then the proof would not hold for *every* random variable $X$, in contradiction to your opening statement.2016-10-18
  • 0
    @EvanAad *Almost* everywhere, sorry about that. Yes the proof works for every random variable.2016-10-18
  • 0
    Thanks for the correction. The good news is that now I see why the deduction is sound. The bad news is that now I fail to see how it solves OP's question. I may be wrong, but it seems to me that in order to tie your answer to OP's question you implicitly invoke [the first derivative test for optimality](https://en.wikipedia.org/wiki/Derivative_test#Precise_statement_of_first_derivative_test).2016-10-18
  • 0
    The problem with this test, though, is that it assumes (at least in the Wikipedia version linked to above) that the derivative exists everywhere in the interval where optimality is to be determined (this interval being $\mathbb{R}$ in the case of OP's question) except possibly at *a single point*, namely the point of optimality.2016-10-18
  • 1
    @EvanAad Adding convexity to the pot will allow you to conclude.2016-10-18
  • 0
    @EvanAad Please stop modifying my answer.2016-10-20
  • 0
    Since differentiability is only present *almost* everywhere, how did you conclude by the signs of $u'(c)$ where they exist that the minimiser must lie where $u'(c)=0$? Is there a generalisation of Fermat's lemma? (I doubt it, though, since Cantor's function has zero derivatives almost everywhere yet doesn't attain minimum or maximum anywhere in between).2016-11-24
  • 4
    @Vim The convexity of the function $u$ makes these objections moot, but here is a more direct route: for every $x$ and $c$, $$ |x-c|=\int_{-\infty}^c[x\leqslant t]\,\mathrm dt+\int_c^{+\infty}[x>t]\,\mathrm dt $$ hence, for every median $m$, $$E(|X-c|)=E(|X-m|)+\int_m^cv(t)dt$$ with $$v(t)=P(X\leqslant t)-P(X>t)=2P(X\leqslant t)-1$$ Then $v$ is nondecreasing and $v(m)\geqslant0$ hence, for every $c>m$, $v\geqslant0$ on $(m,c)$, which implies $E(|X-c|)\geqslant E(|X-m|)$. Likewise for $c2016-11-24
  • 0
    @Did I later saw in Evan's answer about the convexity of $u$, although it wasn't mentioned explicitly in yours. Thanks anyway for your more direct approach.2016-11-24
  • 0
    @Vim See comment on Oct 18 at 16:11.2016-11-24
6

Let $f$ be the pdf and let $J(c) = E(|X-c|)$. We want to maximize $J(c)$. Note that $E(|X-c|) = \int_{\mathbb{R}} |x-c| f(x) dx = \int_{-\infty}^{c} (c-x) f(x) dx + \int_c^{\infty} (x-c) f(x) dx.$

To find the maximum, set $\frac{dJ}{dc} = 0$. Hence, we get that, $$\begin{align} \frac{dJ}{dc} & = (c-x)f(x) | _{x=c} + \int_{-\infty}^{c} f(x) dx + (x-c)f(x) | _{x=c} - \int_c^{\infty} f(x) dx\\ & = \int_{-\infty}^{c} f(x) dx - \int_c^{\infty} f(x) dx = 0 \end{align} $$

Hence, we get that $c$ is such that $$\int_{-\infty}^{c} f(x) dx = \int_c^{\infty} f(x) dx$$ i.e. $$P(X \leq c) = P(X > c).$$

However, we also know that $P(X \leq c) + P(X > c) = 1$. Hence, we get that $$P(X \leq c) = P(X > c) = \frac12.$$

EDIT

When $X$ doesn't have a density, all you need to do is to make use of integration by parts. We get that $$\displaystyle \int_{-\infty}^{c} (c-x) dP(x) = \lim_{y \rightarrow -\infty} (c-y) P(y) + \displaystyle \int_{c}^{\infty} P(x) dx.$$ Similarly, we also get that $$\displaystyle \int_{c}^{\infty} (x-c) dP(x) = \lim_{y \rightarrow \infty} (y-c) P(y) - \displaystyle \int_{c}^{\infty} P(x) dx.$$

  • 0
    Thanks! But does $X$ always have a density?2011-11-25
  • 0
    @Tim: I don't think it is hard to adapt the same idea for the case when $X$ doesn't have a density.2011-11-25
  • 0
    So you are thinking $P$ as cdf of $X$?2011-11-25
3

The following intends to complement Did's answer.

Claim

Denote by $M$ be the set of $X$'s medians. Then

  1. $M = [m_1, m_2]$ for some $m_1, m_2 \in \mathbb{R}$, such that $m_1 \leq m_2$.

  2. For every $m \in M$ and for every $x \in \mathbb{R}$ we have $$ E\left(|X-m|\right) \leq E\left(|X-x|\right). $$ (In particular, $m\mapsto E\left(|X-m|\right)$ is constant on $M$.)

Part 2's proof builds on Did's answer.

Proof

  1. It is known that $M \neq \emptyset$. Define $$ \begin{align} M_1 &:= \left\{t\in\mathbb{R}\ |\!:\ F_X(t) \geq \frac{1}{2}\right\}, \\ M_2 &:= \left\{t\in\mathbb{R}\ |\!:\ P(X$M = M_1 \cap M_2$. It therefore suffices to show that $M_1 = [m_1, \infty)$ and that $M_2 = (-\infty, m_2]$, for some $m_1, m_2 \in \mathbb{R}$.

    Since $\lim_{t\rightarrow-\infty}F_X(t) = 0$, $M_1$ is bounded from below. Since $\lim_{t\rightarrow\infty}F_X(t) = 1$, $M_1$ is an interval that extends to infinity. Hence $M_1 = (m_1,\infty)$ or $M_1 = [m_1,\infty)$, for some $m_1 \in \mathbb{R}$. It follows from $F_X$'s right-continuity that $m_1 \in M_1$. An analogous argument shows that $M_2 = (-\infty,m_2]$ (just verify that $t\mapsto P(X is left-continuous).

  2. Define a function $f:\mathbb{R}\rightarrow\mathbb{R}$ as follows. For every $c \in \mathbb{R}$, set $$ f(c) := E\left(|X-c|\right). $$

    We will begin by showing that $f$ is convex. Let $a, b \in \mathbb{R}$, and let $t \in (0,1)$. Then $$ \begin{align} f\left(ta+(1-t)b\right) &= E\left(\left|X-\left(ta+(1-t)b\right)\right|\right) \\ &= E\left(\left|\left(tX-ta\right)+\left((1-t)X-(1-t)b\right)\right|\right) \\ &\leq E\left(\left|\left(tX-ta\right)\right|+\left|\left((1-t)X-(1-t)b\right)\right|\right) \\ &=E\left(\left|\left(tX-ta\right)\right|\right)+E\left(\left|\left((1-t)X-(1-t)b\right)\right|\right) \\ &= t\ E\left(|X-a|\right) + (1-t)\ E\left(|X-b|\right) \\ &= t\ f(a) + (1-t)\ f(b). \end{align} $$

    Since $f$ is convex, then, by Theorem 7.40 of [1] (p. 157), there exists a set $A \subseteq \mathbb{R}$ such that $\mathbb{R}\setminus A$ is countable, and such that $f$ is finitely differentiable on $A$. Moreover, letting $m \in M$, and letting $x \in (-\infty, m_1)$, Theorem 7.43 of [1] (p. 158) yields that $f'$ is Lebesgue-integrable on $[x,m] \cap A$, and that $$ f(m) - f(x) = \int_{[x,m]\cap A} f'\ d\lambda. $$

    Applying Did's answer, we find that $f'\leq 0$ on $[x,m]\cap A$. Hence $f(m) \leq f(x)$. Similar considerations show that, for every $x \in (m_2,\infty)$, $f(m) \leq f(x)$, and also that $f(m) = f(m_1)$ (implying that $f$ is constant on $M$, since $m$ was chosen arbitrarily in $M$).

    (The argument of the last paragraph was suggested to me by copper.hat in their answer to a related question of mine.)

Q.E.D.


References

[1] Richard L. Wheeden and Antoni Zygmund. Measure and Integral: An Introduction to Real Analysis. 2nd Ed. 2015. CRC Press. ISBN: 978-1-4987-0290-4.

  • 0
    Thanks. Now I understand why if $M$ is an interval then any point where $f$ assumes zero derivative is a global minimiser of $f$. However, what if $M$ is a singleton and $f$ is not differentiable there? (Also, could you give the name of the Lebesgue integrability theorem you invoked?)2016-11-24
  • 1
    @Vim: 1. A singleton $\{s\}$ is an interval of the from $[m_1, m_2]$, $m_1\leq m_2$ with $m_1:=m_2:=s$. 2. [Here's a link to the theorem.](http://imgur.com/a/YhtLK)2016-11-24
  • 0
    I was not asking whether a singleton is an interval or not, rather, I was thinking the how to apply the convexity in this case. Anyway it seems already solved to me now: even though $f$ can fail to be differentiable at this point, its left and right derivatives surely exist by convexity, and the left one is $\le 0$ and the right one $\ge 0$ by the definition of the median.2016-11-24
  • 0
    @Vim: My proof covers the case that $M$ is a singleton.2016-11-24
  • 0
    indeed. I had been actually mainly reading the link in your answer, which seemed a bit simpler so that I could grasp it with less knowledge basis. The singleton concern arose from there but not from this answer.2016-11-24
3

Let $m$ be any median of $X$. Wlog, we can take $m=0$ (consider $X':=X-m$). The aim is to show $E|X-c|\ge E|X|$.

Consider the case $c\ge 0$. It is straightforward to check that $|X-c|-|X|=c$ when $X\le0$, and $|X-c|-|X|\ge -c$ when $X>0$.

It follows that $$ E\left[(|X-c|-|X|)I(X\le0)\right]=cP(X\le0)\tag1 $$ and $$E\left[(|X-c|-|X|)I(X>0)\right]\ge-cP(X>0).\tag2 $$ Adding (1) and (2) yields $$ E(|X-c|-|X|)\ge c\left[P(X\le0)-P(X>0)\right].\tag3 $$ The RHS of (3) equals $c[2P(X\le0)-1]$, which is non-negative since $c\ge0$ and zero is a median of $X$. The case $c\le0$ is reduced to the previous one by considering $X':=-X$ and $c':=-c$.