0
$\begingroup$

Given $n = n_0 + n_1 + \cdots+ n_{m-1}$ and the average value of $n_{j}$ is $E[n_{j}] = n/m$. (Here $E$ is for expectation in context of probability). Here $n$ is not random variable, but $n_{j}$ is random variable.

Does that mean $E[\log(n_{j})] = \log(n)/m$ ? If no, can we find a function $f$ such that $E[\log(n_{j})] = f(n,m)$ ?

PS This is in context of analysis of hashing with chaining with Simple uniform hashing assumption i.e. any given element is equally likely to hash into any of the $m$ slots, independently of where any other element has hashed to. Also $\alpha = n/m$ is called load factor.

  • 1
    ah, I see, $n_j$ are not iid, but they are restricted to sum $n$ (eg a multinomial)2012-09-18

2 Answers 2

0

You seem to be thinking of a multinomial scenario: we throw $n$ balls in $m$ equiprobable bins, $n_j$ is the amounts of balls in bin $j$ (multinomial distribution). In that case, each $n_j$ is a binomial distribution, with $E(n_j)=n p$ and $\sigma^2_{n_j}= n p (1-p)$, with $p=1/m$

You wish to compute $E[\log(n_j)]$ . But that's ill defined (actually, it diverges) because $P(n_j=0)>0$ (we could cheat and compute instead $E[\log(n_j+1)]$, or set $\log(0) p(0)=0$) I doesn't seem to have a simple closed form expression. Using the general Taylor expansion

$E[g(X)] = g(\mu) + \frac{1}{2!} g^{''}(\mu) m_2 + \frac{1}{3!} g^{[3]}(\mu) m_3 + \cdots $

where $\mu = E[X]$ and $m_k = E[(X-\mu)^k]$, we get, in our case, with $X: {\rm Binom}(n,p)$, $g(X)= \log(X)$:

$E[\log(X)] \approx \log(n p) - \frac{1-p}{2 n p} + \cdots = \log(\alpha) - \frac{1-p}{2 \alpha} + \cdots $

with $\alpha=n/m = n p $, but this does not seem very useful as an asympotics with fixed $\alpha$. Of course, you have the bound (already given by Jensen) $E[\log(n_j)] \le \log E(n_j) = \log(\alpha)$

  • 0
    what will be the upper bound on above expression $E[\log(n_j+1)] \approx \log( n p) + \frac{p-1}{2 p n} + \cdots$? will it be $O(\log(n/m))$ or $O(n/m)$. Here $O()$ is Big-O notation of asymptotic complexity.2012-09-18
0

If $X$ is a random variable then your question is tantamount to asking whether $E( \log(X) ) = \log( E(X) ) $

This is a common misconception but the answer is absolutely not! Thinking of it from the definition of expected value, this is like asking (below $p$ is the density of $X$ and $D$ is the domain of $X$) if

$ \int_{D} \log(x) p(x) dx = \log \left( \int_{D} x p(x) dx \right) $

Even for the Uniform(0,1) distribution we can see this doesn't hold since the left hand side is

$ \int_{0}^{1} \log(x) dx = \left[ x \log(x) - x \right]_{0}^{1} = -1 $

since $x \log(x) \rightarrow 0$ as $x \rightarrow 0$. This is because $x$ goes to zero much faster than $\log(x)$ goes to $-\infty$. Then, trivially, the right hand side is $\log(1/2)$, so we can see they are not equal.

In general, Jensen's inequality (see http://en.wikipedia.org/wiki/Jensen's_inequality) and a comment above gives more intuition about why the two are not equal. I hope this helps!

  • 0
    No. I'm not suggesting this. See my comment to dilip comment in question. Also if above is still no, can we compute the function $f(n,m) = E[\log(n_{j})]$ ?2012-09-18