2
$\begingroup$

I have a given data set $D = \{ x_i, y_i \}_{i=1}^n$ for a regression problem. When I plot the data, it looks like there is an underlying parabola (2nd order linear model) and some outliers.

I want to design an approach using a probabilistic model with a latent binary variable $\{ 0,1 \}$ indicating whether a data point is an outlier or not.

Currently I have no idea what I could do, what would the parameters be in this cause and how are they optimized? Is Expectation Maximization an idea?

  • 0
    Are there known outliers in your data? Why isn't using the $\frac{3}{2} IQR$ cutoff sufficient?2012-06-16

1 Answers 1

1

My recommendation is to use robust regression. It is simpler and downweights the outliers.

  • 0
    @MichaelHardy Mahoni said that the model fit a quadratic function except for a few outliers. That indicates that the outliers were hurting the fit. But if you want to detect the outliers the larger residuals from a robust regression fit will be the outliers.2012-06-16