3
$\begingroup$

I'm confused about finding the standard deviation of various weighted objects in a problem. For example, consider an academic setting, with four assessments as follows:

(30pts weight) Quiz 1 - Avg: 75% Std: +/- 35% Quiz 2 - Avg: 39.6% Std: +/- 25.4% Quiz 3 - Avg: 69.4% Std: +/- 22.4%  (20pts weight)  Midterm - Avg: 63.62% +/- 13.09%  (50pts total) Overall - Avg: 62% +/- 10.89% 

Could one then calculate the std's for the course average, by treating the std's like the avg, and just weight them? That's how I got the std above for the overall grade.

  • 0
    The latter, since I can compute the class avg, but I am not given the class avg, standard deviation.2011-11-16

2 Answers 2

1

Since you are combining scores using formula $Y = \sum_k \omega_k X_k$, using the linearity of expectation, and Bienaymé formula, assuming the performances at each test are not correlated: $ \mathbb{E}(Y) = \sum_k \omega_k \mathrm{E}(X_k) \qquad \mathbb{V}(Y) = \sum_k \omega_k^2 \mathbb{V}(X_k) $

I am getting combined average $(61.83 \pm 9.88) \%$.

enter image description here


As MikeSpivey and MichaelLugo justly point out, the assumption of zero correlation poorly reflects actual state of the matter. So assume that $\mathbb{Cor}(X_i,X_j) = \rho_{ij}$. Then $ \mathbb{V}(Y) = \sum_{i,j} \omega_i \omega_i \mathbb{Cov}(X_i, X_j) = \sum_k \omega_k^2 \mathbb{V}(X_k) + \sum_{i < j} \omega_i \omega_j \rho_{ij} \sqrt{\mathbb{V}(X_i)\mathbb{V}(X_j) } $ Since the correlation is clearly positive, the total variance is bigger in the correlated case. I did some computations assuming equal correlation between test.

enter image description here

  • 0
    @MichaelLugo: That is true. However, the point I was trying to make is that assuming correlated test scores is *so* common that it even serves as the model in a *canonical* example used to describe a certain statistical concept. :)2011-11-18
1

Since we can't assume that the assessment scores are uncorrelated, there's no way to calculate the standard deviation of the student course averages solely with the information in the question. (Added: However, we can obtain bounds. See below.)

Provided you have the right additional information, though, there are two ways you could do the calculation.

  1. If you know the covariances or correlations of the assessment scores, you could use the formula Sasha mentions in his answer.
  2. If you know the course average for each student, you could apply the usual standard deviation formula $s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2},$ where the $x_i$ are the $n$ individual student course averages and $\bar{x}$ is the overall course average ($\bar{x} = 62\%$, in your example). (The fact that the $x_i$ are weighted averages is irrelevant; you just want the standard deviation of a set of $n$ data values, and how those values are obtained does not affect the standard deviation calculation.)

I'm guessing that (2) is easier. If it were me, I would already have each student's course average in a spreadsheet. It would take about 5 seconds to tell the spreadsheet to find their standard deviation.


(Added, based on comment of OP below.)

Let's rewrite the general formula for the variance of a weighted sum of variables as $\text{Var(Y)} = {\bf w}^T C {\bf w},$ where ${\bf w} = (0.2(0.35), 0.2(0.254), 0.2(0.224), 0.4(0.1309))$ (i.e., the column vector of weighted standard deviations), and $C$ is the correlation matrix (i.e., $C_{ij} = \rho_{ij}$).

If the information in the question is all that you have available, you can use this formula to obtain bounds on the standard deviation of the overall course average. Since students who do well on one assessment tend to do well on others, and students who do poorly on one assessment tend to do poorly on others, we can assume positive correlations. So we get a lower bound on $s_Y$ by taking $\rho_{ij} = 0$ for all $i \neq j$ and an upper bound by taking $\rho_{ij} = 1$ for all $i,j$. (We have $\rho_{ii} = 1$ for all $i$ in any situation, since a variable has a correlation of $1$ with itself.)

These calculations are straightforward; I get $11.1 \% \leq s_Y \leq 21.8\%$. Note that the lower bound is the one with the assumption of independence, and the upper bound is the one obtained by taking a weighted average of the four assessments, as the OP notes in the comments on Sasha's answer. (I get a slightly different answer for the independence assumption than Sasha; I think he misread the data.)

  • 0
    @eWizardII: If you don't have access to the individual scores (and you would need them to calculate the covariances, too, if you don't already know those), then the best you can do is make an estimate. You could also get an upper bound by using $\rho_{ij} = 1$ in the formula in Sasha's answer. (He doesn't have the formula quite right, though; look at the Wikipedia formula linked to in my answer instead, with the relation $\text{Cov}(X_i,X_j) = \rho_{ij} \sqrt{V(X_i) V(X_j)}$.)2011-11-17