6
$\begingroup$

We have a random generator that generates independent random bits with probability $P(x=1) = P$ and $P(x=0)=1-P$.

Given $N$ random independent bits, we estimate $P$ by $\hat{P} = N_1/(N_0+N_1)$. where $N_0$ is the number of $0$'s and $N_1$ is number of $1$'s. The expected value for $\hat{P}$ can be simply shown to be $P$.

a. What is the expected value for the estimated entropy defined as following: $\hat{H}=-[ \hat{P} \log(\hat{P}) + (1-\hat{P}) \log(1-\hat{P}) ]$

b. If we take $M$ independent sets of $N$ random bits as above and each time estimate the entropy using the above equation, What is the expected value for the smallest estimated entropy among those $M$ sets ?

thanks, MG

P.S. If the solution to the integral for general P is too complicated, a solution to the special case $P=1/2$ would be appreciated as well.

  • 0
    a. for large $N$, $\langle \hat H \rangle$ will approach $-[P \log P + (1-P) \log (1-P)]$.2011-03-04
  • 0
    For the second part, the estimated entropy depends on the deviation of the estimated P from the real P. You can estimate this since $\hat{P}$ has a binomial (and so approximately normal) distribution. Presumably you can estimate the minimal $\hat{H}$ this way (just take the largest deviation from $P$).2011-03-04
  • 0
    @Fabian: Thanks, thats true.2011-03-06
  • 0
    @Yuval: Thanks, it makes sense, so you are suggesting to approximate $\hat{P}$ distribution with normal, and then calculate the expected value (solve the integral for that?)2011-03-06
  • 0
    If you can solve the integral...2011-03-06
  • 0
    Can we use beta integrals for first part as following: $$f(\alpha,\beta)= -\int_{0}^{1}p*Log[p] *p^{\alpha +1}(1-p)^{\beta +1}$$ $$\left \langle \hat{H} \right \rangle = f(\alpha,\beta)+f(\beta,\alpha)$$ where $\alpha,\beta$ are beta distribution parameters related to $N_0,N_1$. Can anyone check the solution to the above integral with Mathematica?2011-03-07
  • 0
    In fact, I am suggesting the above, for a better estimation for $H$ given $N_0$, $N_1$. the integral is with respect to $p$.2011-03-07

1 Answers 1

1

As noted in the comments, there is no general exact formula for $\langle \hat H_N\rangle$ but, for large $N$, one can approach it by a $\chi^2$-type limit.

To wit, $N_1=pN+v\sqrt{N}Z_N$ with $v^2=p(1-p)$, $\langle Z_N\rangle=0$ and $\langle Z_N^2\rangle=1$ for every $N$ and, when $N\to+\infty$, $Z_N$ converging in distribution to a standard normal random variable.

Hence $\hat P_N=p+U_N$ and $1-\hat P_N=q-U_N$ with $q=1-p$ and $U_N=vZ_N/\sqrt{N}$, and $$ \hat H_N=-(p+U_N)\log(p+U_N)-(q-U_N)\log(q-U_N). $$ Using the expansions $$ \log(p+U_N)=\log(p)+U_N/p-U_N^2/(2p^2)+o(U_N^2), $$ and $$ \log(q-U_N)=\log(q)-U_N/q-U_N^2/(2q^2)+o(U_N^2), $$ one gets $$ \hat H_N=-p\log(p)-q\log(q)+U_N\log(q/p)-U_N^2/(2pq)+o(U_N^2). $$ Since $\langle U_N\rangle=0$, $\langle U_N^2\rangle=v^2/N$ and $v^2=pq$, one gets $$ \langle \hat H_N\rangle=-p\log(p)-q\log(q)-1/(2N)+o(1/N). $$