1
$\begingroup$

I'm trying to come up with a 95% confidence interval for the click-through-rate of particular advertisement. It has $x$ clicks out of $n$ impressions so far.

What's the best way to compute this, given that I expect the click-through-rate to be small? I've been told that the "usual" methods of computing a confidence interval don't do well when the true probability $p$ is near 0.

For advertisements, the true click probability is typically in the (0, 0.02) range. I don't have an exact formula for the prior, but any reasonable approximation centered in the (0, 0.02) range would do.

Is there a nice formula of something like

(lower, upper) = confidence_interval(x, n, prior_p, 0.95) 

out there?

Or alternately, has anyone out there used one of the "usual" confidence interval formulas in this situation, and can confirm that it produces "close enough" results?

3 Answers 3

2

One possible thing to do would be to calculate your confidence interval using a use a Beta distribution

For example the following R code

ci <- function(x, n, prior, conf) {        c(qbeta((1 - conf) / 2,   prior[1] + x,  prior[2] + n -x) ,         qbeta((1 + conf) / 2,   prior[1] + x,  prior[2] + n -x)  )  }           prior <- c(1,99) ci(   0,      0, prior, 0.95) ci(  20,   2000, prior, 0.95) ci(2000, 200000, prior, 0.95) 

produces these results

> ci(   0,      0, prior, 0.95) [1] 0.0002557027 0.0365757450 > ci(  20,   2000, prior, 0.95) [1] 0.006203473 0.014677571 > ci(2000, 200000, prior, 0.95) [1] 0.009568703 0.010440574 
  • 0
    Okay, I looked at the qbeta.c file inside the R source code - it's pretty gnarly :) There's also a Python binding out there for R. So I could just use that, or worst-case translate the C to Python.2011-03-26
1

Perhaps this will help:

R code

require(LearnBayes)

q1 <- list(p=0.025, x=1e-10) # can't use zero for this function

q2 <- list(p=0.975, x=0.02)

beta.select(q1,q2)

This gives values of a & b for an estimated beta prior with a 95% CI of 0-0.02. Given that you're approximating a probability, beta seems like the most appropriate distribution to use, a priori.

0

You can always calculate lower and upper numerically. I'm not sure how the confidence interval is usually defined (Bayesian? likelihood?), but surely you can scan over $p$ and determine what you want numerically.

Alternatively, you can try the Poisson approximation, which works if $np$ is small. If $np$ is large then the normal approximation should work, and this probably the assumption underlying the usual formula, whatever it is.

  • 0
    Thanks for the numerical integration idea. As far as Poisson, that seems like just another approximation that I'd rather avoid. I want something that's as exact as possible.2011-03-26