4
$\begingroup$

I was solving a question which technically reduces to the following

Given $N$ items, it is equiprobable for each item to be good or bad, what is the probability that number of good items in the set are greater than or equal to $K$.

This can be reduced to $\dfrac{x}{2^n}$ where $\displaystyle x = \sum_{p = k}^{n} \binom{n}{p}$. Is there a more simplified form which is easier to calculate for large values of $N, K$?

Note: It may be safe to assume that we do not require extremely high precision while calculating(first 5-10 digits only).

Thanks!

  • 0
    Can we assume anything else? For instance, is $k$ much smaller than $n$? This might allow an approximate solution.2017-02-08
  • 0
    @qudit no that wouldn't be correct. However you can always show equality in choosing bad and good items hence $k = min(k, n-k)$ or $k \leq n/2$ holds true.2017-02-08
  • 0
    Of course. But that doesn't help with what I had in mind if k is close to n / 2.2017-02-08
  • 0
    What are the constraints on N and K?2017-02-08
  • 0
    @RazimanTV I am just using it for personal research so anything in order $O(N)$ should be fine as long as it delivers accurate results. Note that using math.factorial in python causes really huge numbers causing problems with memory storage.2017-02-08
  • 0
    Would an algorithm for estimating the value to high accuracy be enough?2017-02-08
  • 0
    @Qudit Perhaps. It would be great to know your approach anyway.2017-02-08
  • 1
    I think you can use floating point arithmetic without losing a lot of accuracy. Start with n choose n = 1 and then build up other factorials with n choose r = n choose (r+1) * (r+1)/(n-r). Some further tweaking like keeping mantissa and exponent separate might be required to deal with large numbers but I don't think it is very hard.2017-02-08
  • 0
    @RazimanTV That does seem like an interesting idea. I will try to work with that but it would be great to hear some non-deterministic approaches as well.2017-02-08
  • 0
    http://www.math.hawaii.edu/~xander/fa12_471/Binomial_Probabilities.pdf2017-02-08
  • 1
    The normal approximation to the binomial is good for large $N$. This indicates that you might use the cumulative normal distribution for approximation. I don't know the details on how good the approximation is for particular $N$, but that information shouldn't be hard to find.2017-02-08
  • 1
    For large values of $n$ the binomial distribution converges to a normal distribution, hence you are essentially asking what is a good way for estimating the [error function][1]. Continued fractions provide extremely good approximations. Chebyshev's inequality gives a poor approximation, Hoeffding's inequality (https://en.wikipedia.org/wiki/Hoeffding's_inequality) a much better one. [1]: https://en.wikipedia.org/wiki/Error_function2017-02-08

2 Answers 2

2

$\newcommand{\bbx}[1]{\,\bbox[8px,border:1px groove navy]{\displaystyle{#1}}\,} \newcommand{\braces}[1]{\left\lbrace\,{#1}\,\right\rbrace} \newcommand{\bracks}[1]{\left\lbrack\,{#1}\,\right\rbrack} \newcommand{\dd}{\mathrm{d}} \newcommand{\ds}[1]{\displaystyle{#1}} \newcommand{\expo}[1]{\,\mathrm{e}^{#1}\,} \newcommand{\ic}{\mathrm{i}} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\mrm}[1]{\mathrm{#1}} \newcommand{\pars}[1]{\left(\,{#1}\,\right)} \newcommand{\partiald}[3][]{\frac{\partial^{#1} #2}{\partial #3^{#1}}} \newcommand{\root}[2][]{\,\sqrt[#1]{\,{#2}\,}\,} \newcommand{\totald}[3][]{\frac{\mathrm{d}^{#1} #2}{\mathrm{d} #3^{#1}}} \newcommand{\verts}[1]{\left\vert\,{#1}\,\right\vert}$ For 'large' $\ds{n, p\ \mbox{and}\ n - p}$, $\ds{{n \choose p} \sim 2^{n}\exp\pars{-\,{\bracks{p - n/2}^{2} \over n/2}}}$.

You can use the $\ds{\bbox[#dfd,5px]{\ Laplace\ Method\ for\ Sums\ }}$ ( see page $761$ in $\ds{\bbox[#fdd,5px]{\ Analytic\ Combinatorics\ }}$ by Philippe Flajolet and Robert Sedgewick, Cambridge University Press $2009$ ) \begin{align} {1 \over 2^{n}}\sum_{p = k}^{n}{n \choose p} & \sim {1 \over 2^{n}} \bracks{\int_{k}^{n}{n \choose n/2}\exp\pars{-\bracks{p - n/2}^{2} \over n/2}\,\dd p} \\[5mm] & \sim {1 \over 2^{n}}\,{n \choose n/2}{\root{2} \over 2}\,n^{1/2}\int_{\pars{k -n/2}/\root{n/2}}^{\infty}\exp\pars{-p^{2}}\,\dd p \\[5mm] & = {\root{2\pi} \over 4}\,{n \choose n/2}\,{n^{1/2} \over 2^{n}}\bracks{1 + \,\mrm{erf}\pars{n - 2k \over \root{2}\root{n}}} \quad \mbox{as}\ n \to \infty \end{align} where $\ds{\,\mrm{erfc}\pars{z} \equiv {2 \over \root{\pi}}\int_{0}^{z}\expo{-x^{2}}\,\dd x}$ is the Error Function.

  • 0
    Shouldn't the final result depend on $k$?2017-02-08
  • 0
    @Qudit Yes. After the $p$-shift the integral lower limit becomes $k - n/2$. We can 'refine' the integration to include some sort of $\,\mathrm{erf}$ function or/and its asymptotic behavior. Thanks.2017-02-08
  • 0
    The $2 \sqrt{\pi} / n$ approximation still seems off. For example, your result suggests that the answer should be the same for $k = n / 4$ and $k = 3 n / 4$ since $2 \sqrt{\pi} / n$ does not depend on $k$. However, the first value is close to $1$ whereas the second is close to $0$.2017-02-08
  • 0
    @Qudit Following your advice, I was checking my previous result. Indeed, I have a terrible typo: the prefactor $2^{n}$ must be ${n \choose n/2}$. I just checked $\texttt{mySol[k_,n_]}$. $$ \texttt{Limit[{mySol[n/4, n], mySol[3n/4, n]}, n -> Infinity] = {1,0}} $$ Thanks for your remarks.2017-02-09
2

One method to compute the sum directly and without losing a lot of accuracy with finite precision arithmetic is to represent floating points as a pair of mantissa and exponent and to express factorials using the recurrence $$\binom{n}{r} = \binom{n}{r-1} \frac{n-r+1}{r}.$$

Here is a simple Python implementation for $\sum\limits_{p=0}^K \binom{N}{p}$:

def binomial_sum(N,K):
    current_exponent, current_mantissa = 0, 1.0
    total_exponent, total_mantissa = 0, 1.0
    for i in range(1, K+1):
            current_mantissa = (N-i+1)*current_mantissa/i
            while current_mantissa>=2:
                    current_mantissa/=2
                    current_exponent+=1
            while current_mantissa<=0.5:
                    current_mantissa*=2
                    current_exponent-=1
            total_mantissa += current_mantissa*pow(2,current_exponent-total_exponent)
            while total_mantissa>=2:
                    total_mantissa/=2
                    total_exponent+=1
    return total_exponent, total_mantissa

binomial_sum(10000000,5000000) is found to be 1.000252313246 × 2^9999999, correct to 12 decimal places.

Of course, this is not going to be as efficient as the approximations using integrals but it should work reasonably well for upto $N\approx10^8$ or so.