0
$\begingroup$

I have a series of observations made against time. Given perfect observations, the trend should be reasonably linear. As I expect there to be measurement errors, I wish to exclude them from further calculations.

The time values are not evenly spaced either.

How would I go about identifying which values fall significantly outside of the norm here?

Below is an extract of the table:

Time (hrs)   Observation 21.516       15.071 21.568       11.555 21.614       12.601 21.675       14.194 21.731       11.308 21.787       11.968 21.842       14.383 23.493        3.269 23.537        5.917 23.581        5.982 23.625        5.696 23.669       10.297 23.713        9.599 23.756        4.074 
  • 1
    One approach is to use Least Squares to fin$d$ a l$i$ne of best fit, then subtract the observe$d$ values from the values approximate$d$ by the line of best fit.2012-11-11

1 Answers 1

1

Assuming you are looking for an automated method - This is a big topic, but here are three cheapo methods off the top of my head:

(1) Get a linear least squares fit for the line (in matlab that's three characters of code A\b), look at all the points that have a large distance to this line - they are outliers. However since this line is very skewed towards the outliers, it isn't out of the question that this results in very bad identification of outliers...

(2) Same as above but get a least absolute deviations fit (absolute value vs squared absolute value). That's similar but just slightly less sensitive to outliers.

(3) The best way to avoid outliers is to do a RANSAC fit, as is very common in computer vision for example. Wikipedia RANSAC

  • 0
    Thanks for that Peter. I've used option 2 and the results are pretty clear to me now.2012-11-11