4
$\begingroup$

I heard an interesting fact a while ago about how people draw a line through a cloud of points on a scatter plot. Usually, when calculating lines of best fit, we use the minimal the sum of squares of residuals. But if you draw it (to the best of your ability) by eye, you likely won't draw a squared error fit. I believe the exponent is slightly lower than two. This means we try to draw the line close to many points, and don't worry about a few extreme outliers.

Does anyone know what sort of error term we use intuitively? I realise the true objective function we minimise subconsciously probably has a more complex form, but this is just to get an indication for the form of the error penalties relative to least squares.

I'm writing a chapter for my thesis about regression analysis and I want to mention the effect of using different exponents on the error term. I thought would be nice to throw this bit of information in, too!

Thanks.

Edit 1: I've just posted this photo for my Facebook friends to do some line of best fit by eye. I have asked them to either send me a picture of the graph with line, or just the extreme y-values of the line. It would be great if we could all have a go! Link: http://www.freeimagehosting.net/ujx9h

Sorry for the annoying hosting site. Suggestions for good hoster welcome.

Edit 2: (Following discussion in comments.) For simplicity's sake, I am restricting the investigation to error functions of the form of sum of simple exponents of the residual.

  • 0
    What do you mean by "I believe the exponent is slightly lower than two."?2012-09-29
  • 0
    Given a clean linear distribution of points with one random bump a human would probably draw a line to fit the linearly distributed points perfectly, which is not least-anything. In general case, you could ask a bunch of people to draw intuitive lines and deduce that term. I doubt you'll get a better answer.2012-09-29
  • 0
    @PatrickLi, I mean that if rather than finding the line of best fit given a chosen exponent, you find the exponent given the line of best fit.2012-09-30
  • 0
    @KarolisJuodelė, I think a plot of linear data with a bump would be a special case. Beside, what you describe is exactly what I'm saying: We would intuitively allow some points to be far away from the line in favour of getting most points close to the time (i.e. follow the linear data (many points) and ignore the bump (a few outliers). I know I could just ask a bunch of people - that's in fact exactly what I should do. But I know someone has already done that! I just don't remember where I saw it, nor what the answer was.2012-09-30
  • 0
    @Karolis It's not least-some-exponent, but it's least-something for an objective function that sufficiently rewards exact fits and penalizes very bad fits. To bdh_dtu: You implied in the question that humans (approximately) use an objective function with some other exponent. Karolis' example shows that we at least don't do that exactly. Of course you could collect data from people and fit an exponent to them. That would raise the slightly self-referential question which exponent to use for that fit :-)2012-09-30
  • 0
    @joriki Indeed. That's what I'm after. We humans live in a world that is very well described by mathematics. But no-one would suggest that we actively use mathematics to go about our daily lives. For example catching a ball involves some moderately complex ballistics - but of course we don't work out simultaneous equations. Assuming there exists a "typical" line that people draw given a set of data, and assuming there exists an exponent which would analytically produce the same line, we can say something about how our built-in error penalty function works.2012-09-30
  • 0
    @bdh_dtu: There seems to be a misunderstanding. I was trying to say that Karolis' example shows that there does *not* exist an exponent which would analytically produce the same lines as humans.2012-09-30
  • 0
    @joriki Ok, I see what you and Karolis are saying. But assuming we have realistic data, that situation won't arise. If the data is pre-filtered somewhat such that there exists a meaningful least sum of error^p, then I believe there will be a "natural" value for p that we tend to use when penalising residuals. It obviously requires that there is some (probably normally distributed) random noise on the data. Otherwise no "minimal error" fit makes sense.2012-09-30
  • 0
    @bdh_dtu: I don't understand why you believe that out of all the possible objective functions we might be optimizing, it should be a least-squares-type error with a different exponent. It could be anything. I think the best you can do is choose some parametrized class of functions, such as those of least-squares type with different exponents, and fit the parameters (in that case the exponent) to data obtained from humans.2012-09-30
  • 0
    I guess one should consider the span the eye covers in such an experiment. I guess the human eye tend to focus on the center of the view. On a different note, it would be interesting (at least to me) to see what happens when people are give a set of scattered plots and asked to draw the line on each, then after a while, they are given the same set again, do you think the lines will be approximately the same? well, it is just a though on a very interesting subject!2012-09-30
  • 2
    The only question asked here is, "Does anyone know what sort of error term we use intuitively?" This is, in my opinion, not a Mathematics question. Rather, it's a question to be answered by searching the psychology literature (or by posting it somewhere where people familiar with the psychology literature hang out).2012-09-30
  • 1
    Mosteller, F., Siegel, A. F., Trapido, E. and Youtz, C. (2006) Fitting Straight Lines by Eye, in Exploring Data Tables, Trends, and Shapes (eds D. C. Hoaglin, F. Mosteller and J. W. Tukey), John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9781118150702.ch62012-09-30
  • 1
    @GerryMyerson Thanks for the link! That pretty much is what I was looking for. And you're right. It's more a question for mathematical modelling and cognitive science. If you post that as an answer I'll accept.2012-09-30

1 Answers 1

2

Mosteller, F., Siegel, A. F., Trapido, E. and Youtz, C. (2006) Fitting Straight Lines by Eye, in Exploring Data Tables, Trends, and Shapes (eds D. C. Hoaglin, F. Mosteller and J. W. Tukey), John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9781118150702.ch6

  • 0
    Figure 6-3 shows a comparison of least squares, least absolute and fit-by-eye for a set of 15 points, where the fit-by-eye is between the two others.2012-10-01