4
$\begingroup$

Let be a $pdf$, $f(X)$, with exponential distribution and other $pdf$, $f(Y)$, with uniform distribution. I realized, 5000 times, for each one respectively that follow:

I used the R to generate a random sample of size $n = 5$, of the random variable $X$ (and $Y$) with parameter $\gamma = 2,0$ (for $Y$ I used min=0 e max = 4). I calculated the confidence interval 95% for $\mu$ using the data from this sample.

For $X$ variable I got a relative frequency closed to 80%, but for $Y$ variable I got a relative frequency closed to 95%. My question is Why with distribution uniform e small $n$ the confidence intervals have better behave?

  • 0
    You could just do the integrations, or look them up, e.g. at Wikipedia for an [exponential distribution](http://en.wikipedia.org/wiki/Exponential_distribution) or a [uniform distribution](http://en.wikipedia.org/wiki/Uniform_distribution_%28continuous%29)2012-09-03

1 Answers 1

2

If you look at whether your confidence intervals are below $2$ or above $2$, of those from the exponential distribution about 11.3% are too low, 88.3% include the population mean of $2$, and 0.4% are too high, while with a normal distribution you would get about 2.5%, 95% and 2.5% respectively. You could use the following R code to simulate this.

The principal cause of this is that the exponential distribution is too skewed with a sample size of only $5$. The sample mean from an exponential distribution has a Gamma distribution which, (largely but not exclusively) because of its skewness, is poorly modelled by a Student's $t$ distribution.

cases    <- 5000 ss       <- 5 popmean  <- 2 #sampledata <- matrix(rnorm(cases*ss, mean=popmean, sd=1), ncol=ss)   #normal #sampledata <- matrix(runif(cases*ss, min=0, max=2*popmean), ncol=ss) #uniform  sampledata <- matrix(rexp(cases*ss, rate=1/popmean), ncol=ss)        #exp samplemean  <- rowMeans(sampledata) samplese    <- sqrt((rowSums(sampledata^2)/ss - samplemean^2)/(ss-1)) confintfactor <- qt(0.975, df=ss-1)               # about 2.776 for ss=5  toolow  <- sum( samplemean + confintfactor*samplese  < popmean) / cases  toohigh <- sum( samplemean - confintfactor*samplese  > popmean) / cases c(toolow, 1-toolow-toohigh, toohigh)              # would like 0.025 0.95 0.025  

For the uniform distribution, about 3.3% of the confidence intervals are too low, 93.4% include the population mean of $2$, and 3.3% are too high. Here too the Student's $t$ distribution is not a perfect model, but it is better than before: in particular there is no skewness.