I have runtimes for requests on a webserver. Sometimes events occur that cause the runtimes to skyrocket (we've all seen the occasionaly slow web page before). Sometimes, they plummet, due to terminated connections and other events. I am trying to come up with a consistent method to throw away spurious events so that I can evaluate performance more consistently.
I am trying Chauvenet's Criterion, and I am finding that, in some cases, it claims that all of my data points are outliers. How can this be? Take the following numbers for instance:
[30.0, 38.0, 40.0, 43.0, 45.0, 48.0, 48.0, 51.0, 60.0, 62.0, 69.0, 74.0, 78.0, 80.0, 83.0, 84.0, 86.0, 86.0, 86.0, 87.0, 92.0, 101.0, 103.0, 108.0, 108.0, 109.0, 113.0, 113.0, 114.0, 119.0, 123.0, 127.0, 128.0, 130.0, 131.0, 133.0, 138.0, 139.0, 140.0, 148.0, 149.0, 150.0, 150.0, 164.0, 171.0, 177.0, 180.0, 182.0, 191.0, 200.0, 204.0, 205.0, 208.0, 210.0, 227.0, 238.0, 244.0, 249.0, 279.0, 360.0, 378.0, 394.0, 403.0, 489.0, 532.0, 533.0, 545.0, 569.0, 589.0, 761.0, 794.0, 1014.0, 1393.0]
73
values. A mean of 222.29
, and a standard deviation of 236.87
. Chauvenet's criterion for the value 227
would have me calculate the probability according to a normal distribution (0.001684
if my math is correct). That number times 73
is .123
, less than .5
and thus an outlier. What am I doing wrong here? Is there a better approach that I should be taking?