0
$\begingroup$

Given $n = n_0 + n_1 + \cdots+ n_{m-1}$ and the average value of $n_{j}$ is $E[n_{j}] = n/m$. (Here $E$ is for expectation in context of probability). Here $n$ is not random variable, but $n_{j}$ is random variable.

Does that mean $E[\log(n_{j})] = \log(n)/m$ ? If no, can we find a function $f$ such that $E[\log(n_{j})] = f(n,m)$ ?

PS This is in context of analysis of hashing with chaining with Simple uniform hashing assumption i.e. any given element is equally likely to hash into any of the $m$ slots, independently of where any other element has hashed to. Also $\alpha = n/m$ is called load factor.

  • 0
    No. In general, $E[f(X)]$ does not equal $f(E[X])$: the expectation of a function of a random variable is not the same as the function of the expectation. For example, if $f(x) = x^2$ is the function, then $$E[f(X)] = E[X^2] \geq \left(E[X)\right)^2 = f(E[X]).$$ However, for some choices of the function $f$, we can use _Jensen's Inequality_ to get a bound on $E[f(X)]$. Indeed, the example above can be viewed as an illustration of Jensen's Inequality.2012-09-17
  • 0
    @DilipSarwate ok. but in above case it's not exactly $f(E[X])$. That would mean $log(n/m)$. In my proposed result, $m$ is outside $log()$2012-09-17
  • 0
    @Ankush Okay but why in the world would you expect this equality2012-09-17
  • 0
    Ankush: You need to fix your notation. If $n$ is a random variable that is the sum of $m$ other random variables $n_0, n_1,\ldots, n_{m-1}$, then $n/m$, which is _also_ a random variable, cannot be the _expected value_ of $n_j$. Expected values should be constants, not random variables. Maybe you meant that $E[n_j] = \frac{1}{m}E[n]$, or something else entirely.2012-09-17
  • 0
    @DilipSarwate $n$ is NOT random variable, but $n_{j}$ is random variable.2012-09-18
  • 0
    @MichaelChernick I don't know, I'm just guessing it. I'm more than happy if anyone can reduce above as some function of $n$2012-09-18
  • 0
    @Ankush: if $n_j$ is a r.v., and $n$ is the sum of r.v., then $n$ is a r.v. I (we) dont understand how you define $n$2012-09-18
  • 1
    ah, I see, $n_j$ are not iid, but they are restricted to sum $n$ (eg a multinomial)2012-09-18

2 Answers 2

0

You seem to be thinking of a multinomial scenario: we throw $n$ balls in $m$ equiprobable bins, $n_j$ is the amounts of balls in bin $j$ (multinomial distribution). In that case, each $n_j$ is a binomial distribution, with $E(n_j)=n p$ and $\sigma^2_{n_j}= n p (1-p)$, with $p=1/m$

You wish to compute $E[\log(n_j)]$ . But that's ill defined (actually, it diverges) because $P(n_j=0)>0$ (we could cheat and compute instead $E[\log(n_j+1)]$, or set $\log(0) p(0)=0$) I doesn't seem to have a simple closed form expression. Using the general Taylor expansion

$$E[g(X)] = g(\mu) + \frac{1}{2!} g^{''}(\mu) m_2 + \frac{1}{3!} g^{[3]}(\mu) m_3 + \cdots $$

where $\mu = E[X]$ and $m_k = E[(X-\mu)^k]$, we get, in our case, with $X: {\rm Binom}(n,p)$, $g(X)= \log(X)$:

$$E[\log(X)] \approx \log(n p) - \frac{1-p}{2 n p} + \cdots = \log(\alpha) - \frac{1-p}{2 \alpha} + \cdots $$

with $\alpha=n/m = n p $, but this does not seem very useful as an asympotics with fixed $\alpha$. Of course, you have the bound (already given by Jensen) $E[\log(n_j)] \le \log E(n_j) = \log(\alpha)$

  • 0
    yes, you got the problem right! I want to understand more on why it's not possible as you mentioned in 2nd paragraph. Can you please elaborate on this. If it is not possible, proof that shows this will be helpful. Thanks.2012-09-18
  • 0
    for $n_{j} = 0$ you can treat $ \log(n_{j}) = 0 $ or $1$ whatever is convenient.2012-09-18
  • 0
    what will be the upper bound on above expression $$E[\log(n_j+1)] \approx \log( n p) + \frac{p-1}{2 p n} + \cdots$$? will it be $O(\log(n/m))$ or $O(n/m)$. Here $O()$ is Big-O notation of asymptotic complexity.2012-09-18
0

If $X$ is a random variable then your question is tantamount to asking whether $$E( \log(X) ) = \log( E(X) ) $$

This is a common misconception but the answer is absolutely not! Thinking of it from the definition of expected value, this is like asking (below $p$ is the density of $X$ and $D$ is the domain of $X$) if

$$ \int_{D} \log(x) p(x) dx = \log \left( \int_{D} x p(x) dx \right) $$

Even for the Uniform(0,1) distribution we can see this doesn't hold since the left hand side is

$$ \int_{0}^{1} \log(x) dx = \left[ x \log(x) - x \right]_{0}^{1} = -1 $$

since $x \log(x) \rightarrow 0$ as $x \rightarrow 0$. This is because $x$ goes to zero much faster than $\log(x)$ goes to $-\infty$. Then, trivially, the right hand side is $\log(1/2)$, so we can see they are not equal.

In general, Jensen's inequality (see http://en.wikipedia.org/wiki/Jensen's_inequality) and a comment above gives more intuition about why the two are not equal. I hope this helps!

  • 0
    No. I'm not suggesting this. See my comment to dilip comment in question. Also if above is still no, can we compute the function $f(n,m) = E[\log(n_{j})]$ ?2012-09-18