2
$\begingroup$

Suppose I have two poll questions that can be answered with a Yes or a No with the following results:

Poll 1

  • Yes: $200$
  • No: $100$

Poll 2

  • Yes: $2$
  • No: $1$

Both polls have a $66\%$-$33\%$ split, but getting them to a $50\%$-$50\%$ split is much harder with the first poll than with the second poll. There would have to be $100$ new "No"s for the first as opposed to $1$ new "No" for the second one, two orders of magnitude in difference. Is there a mathematical/statistical name for this "resistance to change"?

2 Answers 2

3

Sometimes a result is said to be "robust" if it holds despite small changes in input.

  • 0
    I didn't really want to edit the original question to put this, but do you also know of where I can find ways to calculate the degree of robustness?2011-06-15
  • 0
    @El'endia, sorry, can't help you. I was using "robust" in a qualitative sense, not quantitative, and I don't know how to put a number to it. Maybe a websearch for "robustness measure" or some such term will turn something up.2011-06-15
  • 0
    @Elen'dia Starman: There is a lot of information on **robust statistics**. As usual, one can start with Wikipedia. Then look for books on robust statistics, or more generally non-parametric statistics. However, robust statistics, it seems to me, is not what the problem as described is about. Robust statistics (in part) tells you what to do when you do an experiment $4$ times and get the results $11.3$, $12.1$, $9.7$, and $444.4$. Do you average the results? Of course not, you throw the "outlier" $444.4$ away. Well, it is not always that obvious, hence robust statistics.2011-06-15
1

Most of what you are trying to get at in the supplied example is captured by the notion of sample variance, or its square root, the sample standard deviation.

In both cases, the sample mean of the "yes" responses simplifies to $\frac{2}{3}$. But in the case of the poll in which your sample size was $300$, the estimate $2/3$ is a far more reliable indicator of the beliefs of the general population than the poll based on a sample size of $3$!

The notion of sample variance (or standard deviation) is the usual way of trying to capture numerically this notion of reliability. It is the mathematics behind the often heard phrase "the result is accurate to $\pm 3$ percent $19$ times out of $20$."

  • 0
    Right, I know about standard deviance. However, I don't see a way to calculate that from just those numbers.2011-06-15
  • 0
    @El'endia Starman: The formulas are in the link. If $p$ is the sample mean (it should have a "hat" on it), and $n$ is the sample size, the sample variance is $p(1-p)/n$. Sometimes one divides by $n-1$ instead. No big difference for realistic sample sizes.2011-06-15
  • 0
    Well, the problem is that I only have one data point, which makes all those formulas useless. If it helps, think about these numbers as the ratings for a bookseller on say...Amazon. Someone with 1000 positive votes and 10 negative is much more likely to be reliable than someone with 100 positive and 1 negative...2011-06-15
  • 0
    @El'endia Starman: Your Amazon example seems to involve plenty of data. The place where robustness issues would arise in this setting is that a smaller data set may not only have larger sample variance in the standard sense, but may be the result of manipulation (the book's author has $100$ devoted relatives.)2011-06-15
  • 0
    So how would I calculate the sample variance with one data point ($=\frac{1000}{1010}$ or $=\frac{100}{101}$)?2011-06-15
  • 0
    @El'endia Starman: In the first instance, you have not $1$ data point, you have a sample of $1010$, $n=1010$. And the $p$ (with hat!) that I mentioned is $1000/1010$. Now for the sample variance calculate $p(1-p)/1010$, for the standard deviation take the square root of that. By the way, for *small* failure rates, good approximations for probabilities that various numbers of people will hate the book can be found using the *Poisson approximation* to the binomial.2011-06-15
  • 0
    @El'endia Starman: I should mention that where we are dealing with small probabilities of failure, or fairly rare diseases, estimates of **ratios** are of greater practical importance than estimates of differences. Differences are what the classical estimation theory for the binomial deals with. For ratios, one is really asking about estimates of a logarithm. Of course there is plenty of literature on this specialized problem also.2011-06-15
  • 0
    @user6312: Ah, thanks very much! :)2011-06-15