7
$\begingroup$

Let $\mathbf{x}$ be a vector of $n$ numbers in the range $[0, p]$, where $p$ is a positive real number.

What's is the maximum of the variance function of this $n$ numbers?

The variance of the vector $\mathbf{x}$ is given by:

$$ \operatorname{var} (\mathbf{x}) = \frac{1}{n} \sum_{i=1}^{n} {\left( {x}_{i} - \overline{\mathbf{x}} \right )}^2 $$

Where the mean $\overline{\mathbf{x}}$ is given by:

$$ \overline{\mathbf{x}} = \frac{1}{n} \sum_{i=1}^n x_i .$$

Thank You.

  • 0
    What is the variance function of a set? The maximum with respect to what? Please try to rephrase your question using more standard terminology.2011-11-17
  • 1
    If $X$ takes on values in an interval of length $c$, then its variance is bounded above by $c^2/4$ with equality when half the probability mass is at one end of the interval and half at the other (assuming the interval is closed at both ends; else we have strict inequality)2011-11-17
  • 0
    @DilipSarwate, Does it work just like Entropy? What if I have only 3 samples in the range [0 1], which variance could they achieve at most? It seems you neglected the factor of how many samples there are.2011-11-17
  • 0
    @joriki, I've updated the question. Thank You.2011-11-17
  • 0
    @Drazick:That was a misunderstanding. The variance and mean of a vector $\mathbf x$ are standard enough; what's not standard is the maximum of a variance function of a set. What is the maximum being taken over? And what's a variance function of a set? Dilip seems to have interpreted your question to be asking about the maximal variance that $n$ numbers in the range $[0,p]$ can have -- is that what you meant? This is not the variance of a set; in fact, as Dilip pointed out, the maximal variance in that case is achieved when most of the numbers are the same, so you can't consider them as a set.2011-11-17
  • 1
    "What if I have only 3 samples in the range [0 1], which variance could they achieve at most?" The maximum is still _bounded above_ by $c^2/4$ which is $1/4$ in this instance. The bound cannot be attained with equality in this instance but that does not invalidate the bound. For odd $n$ and $c = 1$, putting one point at the center (mean) and the rest at the end points gives $\frac{1}{4}\times \frac{n-1}{n}$ which, when $n$ is large, is close enough to the upper bound of $\frac{1}{4}$for gummint purposes.2011-11-17
  • 0
    @DilipSarwate, Could you prove this property as an answer so I could grant it as the right answer? Thanks.2011-11-17
  • 0
    @joriki, I will update the question to use a vector of numbers instead of a set, would that make sense? If not, I'll be happy to make it more accurate and clear. Thanks.2011-11-17
  • 0
    @Drazick: Yes, that's clearer. It's still slightly unusual to speak of the "variance function" instead of just the "variance", and you don't really need the vector since you then say "of the $n$ numbers", which is clear enough; but at least now one doesn't have to guess to know what is meant.2011-11-17
  • 0
    @DilipSarwate, Could you prove this property as an answer so I could grant it as the right answer? I would be happy if you could show the max of the variance given $ n $ numbers and show that it goes to $ {c}^2 / 4 $ as $ n $ goes to infinity. Thanks.2011-11-17

1 Answers 1

8

Since $x_i \leq c$, $\displaystyle \sum_i x_i^2 = \sum_i x_i\cdot x_i \leq \sum_i c\cdot x_i = cn\bar{x}.$ Note also that $0 \leq \bar{x} \leq c$. Then, $$ \begin{align*} n\cdot \text{var}(\mathbf{x}) &= \sum_i (x_i - \bar{x})^2= \sum_i x_i^2 - 2x_i\bar{x} + \bar{x}^2\\ &= \sum_i x_i^2 - 2\bar{x}\sum_i x_i + n\bar{x}^2= \sum_i x_i^2 - n\bar{x}^2\\ &\leq cn\bar{x} - n\bar{x}^2 = n\bar{x}(c-\bar{x}) \end{align*} $$ and thus $$\text{var}(\mathbf{x}) \leq \bar{x}(c-\bar{x}) \leq \frac{c^2}{4}.$$

Added note: (second edit)
The result $\text{var}(X) \leq \frac{c^2}{4}$ also applies to random variables taking on values in $[0,c]$, and, as my first comment on the question says, putting half the mass at $0$ and the other half at $c$ gives the maximal variance of $c^2/4$. For the vector $\mathbf x$, if $n$ is even, the maximal variance $c^2/4$ occurs when $n/2$ of the $x_i$ have value $0$ and the rest have value $c$. Someone else posted an answer -- it has since been deleted -- which said the same thing and added that if $n$ is odd, the variance is maximized when $(n+1)/2$ of the $x_i$ have value $0$ and $(n-1)/2$ have value $c$, or vice versa. This gives a variance of $(c^2/4)\cdot(n^2-1)/n^2$ which is slightly smaller than $c^2/4$. Putting the "extra" point at $c/2$ instead of at an endpoint gives a slightly smaller variance of $(c^2/4)\cdot(n-1)/n$, but both choices have variance approaching $c^2/4$ asymptotically as $n \to \infty$.

  • 1
    Could you explain the last part of your derivation? I don't understand where $ \bar{x}(c - \bar{x}) \lt \frac{c^2}{4} $ comes from.2017-02-18
  • 0
    I had a google and found a paper called "A Better Bound on the Variance" by Rajendra Bhatia and Chandler Davis which seemed helpful but it relied on the assumption that $\forall x\in \mathbb{R}$, $ (M - m)^2 \geq 4(M -x)(x - m)$ where $M \geq m$ and [m, M] are the bounds individual values used to compute the arithmetic mean. That assumption would help here too but I don't understand where it comes from.2017-02-18
  • 0
    $\bar x \in [0,c]$ and $\bar{x}(c-\bar{x})$ has maximum value $\frac{c^2}{4}$ when $\bar{x}=\frac c2$. With regard to the alleged better bound, if the range is $[m,M]$ instead of $[0,c]$, the variance works out to be $(M-\bar{x})(\bar{x}-m)$ and this is claimed to be smaller than $\frac{(M-m)^2}{4}$. Pardon me if I fail to be greatly impressed by the better bound that Google found for you.2017-02-18
  • 0
    @delcypher Hi, you can treat x bar as the variable, and use completing the square to find the maximum of the univariate quadratic polynomial2018-03-19