0
$\begingroup$

I was looking for an equation for variance that takes into account a large sample size that has a limited set of integers. I couldn't find one, probably because I never took statistics and don't know the proper terminology to look for. Long story short I set out to write my own equation. I came up with two possible equations.

Before I display the equations i just wanted to explain my goals.

1)I want one stream lined equation for a large amounts of data that is easily calculated even if the data is in multiple files. For example if you had the monthly electricity usage of households in America rounded to the nearest kWh.

2)If the equation works it can be used for non-integer data as well. For example say you had a set of data that is accurate to the 1st decimal place. Inside the parentheses would change from $i- \bar x$ to $i(0.1)- \bar x$ with i being 10x the raw data, so if raw your data is 3.2 then the corresponding i would be 32

Equation 1: $$\sigma^2=\frac{1}{\sum_{z\in x} z} \sum_{i=0}^n x(i-\bar x)^2$$

Equation 2: $$\sigma^2=\frac{1}{\sum x \forall i} \sum_{i=0}^n x(i-\bar x)^2$$

So, \fgeA isn't working to get the for all symbol as you can clearly see, anyways if someone could tell me if one, both or neither of these equations is/are correct I would greatly appreciate it.

Hey they fixed my equation, thnak you.

1 Answers 1

1

Look in the section on "Online algorithm" (the "numerically stable algorithm") in this discussion of Algorithms for calculating variance:

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

  • 0
    So, I think I'm a bit in over my head. But in the article you sent me to what does M stand for? I'm fairly confident about puzzling out the rest on my own.2017-02-08