3
$\begingroup$

I'm writing a test for a probabilistic data structure that I've implemented. Since its probabilistic, its performance is different every time, and in particular, the performance varies much more widely (as a percentage) when the number of items is small. I want the test to fail if the performance is so bad that it's unlikely to have been that bad by chance. Basically, I want a significance test.

In the test, I check the structure n times. Each check may fail with probability p. If the total number of failures is more than twice the standard deviation greater than the mean (n*p), I want to fail the test. Now, I know n and I can calculate p. How can I use n and p to get the standard deviation?

I know it can be easily estimated by simulation, but there are about a hundred combinations of n and p, so I would prefer a way to just calculate it. Unfortunately, I haven't found an answer, or if I have it was too complex for this layman to notice.

  • 0
    Thanks, Henry. It was poorly phrased.2011-06-10

1 Answers 1

4

This is the binomial distribution.

If you know $p$ before the test and the checks are independent then the standard deviation is $\sqrt{ n p (1-p)}$

  • 0
    Similarly to calculate mean we simply multiply p and n.2013-05-04