I'm writing a test for a probabilistic data structure that I've implemented. Since its probabilistic, its performance is different every time, and in particular, the performance varies much more widely (as a percentage) when the number of items is small. I want the test to fail if the performance is so bad that it's unlikely to have been that bad by chance. Basically, I want a significance test.
In the test, I check the structure n times. Each check may fail with probability p. If the total number of failures is more than twice the standard deviation greater than the mean (n*p), I want to fail the test. Now, I know n and I can calculate p. How can I use n and p to get the standard deviation?
I know it can be easily estimated by simulation, but there are about a hundred combinations of n and p, so I would prefer a way to just calculate it. Unfortunately, I haven't found an answer, or if I have it was too complex for this layman to notice.