2
$\begingroup$

In a paper Distributed Private Data Analysis: On Simultaneously Solving How and What, a bit vector $x=(x_1,\dots,x_n)$ is given, whose sum $f=\sum_i x_i$. Each bit $x_i$ is independently flipped with probability $p=\frac{1}{2+\epsilon}$ for fixed $\epsilon>0$. That is, let $z_i=x_i$ with probability $1-p$ and $z_i=1-x_i$ otherwise. Then given the flipped bit vector $\mathrm{flip}(x)=(z_1,\dots,z_n)$, the original sum $f$ has an unbiased estimator: $$ \hat{f}=\epsilon^{-1}((2+\epsilon)\sum_i z_i) - \epsilon^{-1}n $$ The paper then claims that this estimator has error $\mathcal{O}(\sqrt{n}/\epsilon)$ with constant probability. That is, for all bit vectors $x$: $$ \Pr[|\hat{f}(x) - f(x)| > \mathcal{O}(\sqrt{n}/\epsilon)] \leq \mathcal{O}(1) $$ They say they got it via Chernoff bound. So I tried to work my way, using the definition of Chernoff bound from the book "The Probabilistic Method" (note: the absolute is missing): $$ \Pr[\hat{f} - f > \ell] \leq \mathbb{E}[e^{\lambda(\hat{f} - f)}] e^{-\lambda \ell} = \mathbb{E}[e^{\lambda \hat{f} }] e^{-\lambda (\ell+ f)} $$ Then $$ \mathbb{E}[e^{\lambda \hat{f} }] = \mathbb{E}[e^{2 \frac{\lambda}{\epsilon} \sum_i z_i+ \lambda \sum_i z_i - \frac{\lambda}{\epsilon}n }] = \mathbb{E}[e^{2 \frac{\lambda}{\epsilon} \sum_i z_i} ] \mathbb{E}[ e^ { \lambda \sum_i z_i } ] e ^ {- \frac{\lambda}{\epsilon} n } $$

$$ = (\prod_i \mathbb{E}[e^{2 \frac{\lambda}{\epsilon} z_i} ] ) (\prod_i \mathbb{E}[ e^ { \lambda z_i } ] ) e ^ {- \frac{\lambda}{\epsilon} n } $$

When I expand $\mathbb{E}[ e^ { \lambda z_i } ]$ to $p e^ { \lambda (1-x_i) } + (1-p) e^ { \lambda x_i } $, I collect in two groups to get rid of $x_i$'s; a group evaluated at $x_i=1$ and raised to power $f$ (number of ones in the bit vector) and similarly for $x_i=0$.

I end up with a big ugly equation that I have no hope of setting $\lambda$ to get a constant probability (in $n$) as the paper claims (remember $\ell = \mathcal{O}(\sqrt{n}/\epsilon)$).

I am not sure how to proceed or whether I am doing things wrong. I've spend a lot of time on this (several weeks!) and tried to read about Chernoff bound from different sources. Any feedback is welcome, hints or directions are also great. Thank you.

1 Answers 1

1

The original binomial random variable has variance $\Theta(n)$, and so $\hat{f}-f$ has zero mean and variance $\Theta(n/\epsilon^2)$. The standard deviation is $\Theta(\sqrt{n}/\epsilon)$, and the central limit theorem implies that the probability that you're more than a constant number of standard deviations off is some constant.

Chernoff's bound is a "finite" version of the central limit theorem giving explicit bounds (rather than bounds in probability), that are useful to bound large deviations. But in this case you don't even have to use it. But if you do use it, it should work.

The version you're quoting from the book is part of the proof of the bound. The proof already optimizes $\lambda$ and estimates the resulting bound. If you use a more streamlined version (which must be available even in the book), you should just "plug the numbers" and get your result.

  • 0
    @Yuval Filmus: Thanks for quick reply. I've traced through calculating the variance of $\hat{f}-f$, it turned out to exactly $n/\epsilon ^2 + n / \epsilon $. I am not sure how this is $\Theta(n/\epsilon^2)$ ?2011-06-30
  • 0
    $\epsilon$ is a small number, so $1/\epsilon$ is much smaller than $1/\epsilon^2$. Think of $\epsilon$ as $10^{-6}$, so $1/\epsilon = 10^6$ while $1/\epsilon^2 = 10^{12}$. So $n/\epsilon^2 + n/\epsilon \approx n/\epsilon^2$.2011-06-30
  • 0
    @Yuval Filmus: Actually $\epsilon$ here is the $\epsilon$-differential privacy parameter which is typically between 0.1 and 1.5. We could regard $n/\epsilon^2 + n/\epsilon$ as $n(1+\epsilon)/\epsilon^2$ and discard $(1+\epsilon)$ as a constant but this doesn't make sense since the denominator is also a constant... I am confused here2011-06-30
  • 0
    If $\epsilon$ is between $0.1$ and $1.5$ then the $\Theta(1/\epsilon^2)$ and $\Theta(1/\epsilon)$ terms are not interesting, since they are both $\Theta(1)$. In other words, you can just ignore $\epsilon$.2011-06-30
  • 0
    @Yuval Filmus: Okay, I won't bother you with more questions, thanks a lot for your time :)2011-06-30
  • 0
    @Yuval Filmus: I was hoping for one more question, existing Chernoff bound version depends on certain assumptions upon the variables being summed. One such assumption is that they have the same expectation (a number). In my case the expectation of the flipped bit is a function of the original bit, so I am not sure if an existing version fits this problem.2011-06-30
  • 0
    These assumptions are not really needed, but in your case all the bits are symmetric and so all the variables have the same distribution (and in particular, the same expectation). Note that you don't really care about the value of the original bit, you only care whether it equals its noisy version. The "noise component" is the same for all bits.2011-06-30
  • 0
    @Yuval Filmus: The expectation of the flipped bit (if it is different than the original bit) is 1 if the original bit was 0 and -1 otherwise (in case of $\hat{f}-f$). Unfortunately $|\hat{f}-f|$ doesn't help because we can't consider the absolute value of each bit alone. Do you have any idea on this ?2011-07-01
  • 0
    This can't be simply the number of bit-flips that occurred because there can be as much as $n$ bit-flips but still the additive error is zero (in case number of ones is equal to the number of zeroes). From the application perspective, the number of bit-flips doesn't matter as long as the sum is as close as possible to the correct value. (It is a protocol for approximating the sum..)2011-07-01
  • 0
    The contribution of each bit to $\hat{f} - f$ is either $Z$ or $-Z$, depending on the bit, where $Z$ is some affine transformation of a Bernoulli random variable. So you're right that the bits are not symmetric, but the bound works anyhow. If it helps, both $Z$ and $-Z$ have the same first two moments.2011-07-01