2
$\begingroup$

It's been a while since I took a statistics course, but this question came to mind the other day.

Let's suppose that I am looking at Salary data, but the only data provided is the quartiles. For example:

Q1 = 25 percentile = 40 000

Q2 = 50 percentile = 70 000

Q3 = 75 percentile = 100 000

Assuming that we have a normal distribution and the above information, is it possible to calculate any given percentile? If so, how?

Any help would be appreciated. Thanks!

  • 0
    @GEdgar, however, believe this or not, people DO model such data by gaussian distributions.2011-09-26

4 Answers 4

4

The gaussian random variable must be centered at $Q_2$ and its first and third quartiles must be at $Q_1$ and $Q_3$ respectively. Since the first and third quartiles of the gaussian random variable with mean $m$ and variance $\sigma^2$ are at $m-0.68\sigma$ and $m+0.68\sigma$ respectively, one gets $m=Q_2$ and $\sigma=(Q_2-Q_1)/.68=(Q_3-Q_2)/.68$.

Edit About $5.6\%$ of this distribution fall in the negative part of the real axis. This is usually considered as an acceptable trade-off between plausibility (since all the data should be nonnegative) and practicability (since gaussian models are so convenient).

  • 0
    No problem. It is all the more satisfying to be understood.2011-09-26
1

Your data can't be a normal distribution, because then the distance between Q1 and Q2 would be the same as the distance between Q2 and Q3.

1

What Henning Makholm says is right. But assuming you can correct this error I think you need to solve the following equation for $\sigma$ : $ 0.75=\int_{-\infty}^{Q_3}\frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-Q_2)^2}{2\sigma^2}}dx $ You may try numeric approximation.

After you get the variance you can easily standardize to get any quantile you want.

  • 0
    If you're trying to solve an applied problem I'd recommend you to use R [link](http://www.r-project.com), where you don't even have to "understand" the formulae since you "speak" and "write" in R about means, standard deviations, shapes, etc. Also, the vector handling in R is absolutely fantastic for doing statistics.2011-09-28
1

If you fit the quantiles to a known distribution, you can calculate any percentile with the distribution's quantile function, which is the inverse of the CDF. However, with only 3 quantiles, any 3-parameter distribution will fit, so you need to choose the distribution beforehand. If possible you should get some raw data or more quantiles. See this link also has some handy R code for fitting quantiles to a distribution using optim() and the distribution's quantile function.

I've found that income/salary data are best fit by a generalized (aka shifted, aka 3-parameter) log-logistic distribution. The log-logistic also has the advantage of having a closed-form quantile function which is easy to calculate and easy for the optimization library to use. I ended up having to write my own shifted log-logistic quantile function after not finding exactly what I wanted in available R packages:

# Shifted/generalized log-logistic quantile function # http://en.wikipedia.org/wiki/Shifted_log-logistic_distribution # The qllog3() function from package FAdist appears to be a different parameterization # location = mu, scale = sigma, shape = xi qshllogis <- function(p,location=0,scale=1, shape=0) {     if(shape == 0) {         # Revert to logistic distribution         return( qlogis(p,location,scale) );     }     else {         return(scale * ( (1/p - 1)^(-shape) - 1) / shape + location);     } }