1
$\begingroup$

I have a set of data $(x_i,y_i)$ for which i coumputed the Joint Probability Density Functions (see fig1). I would like to find the "best fitting line" that describes the distribution. In other words, I would like the dashed line in the figure. Fig 1 One important point is that the two variables are both affected by error, therefore the estimator should be "simmetric", that is return the same line if applied $f(x,y)$ or $f(y,x)$ (mirrored of course).

I tried several estimators, but couldn't find what I'm looking for. Both the Ordinary Least Square and the Generalised Least Square returns a results that depends by which variable is considered the "observed variable" and which the "explanatory variable"; while in my case both the readings are affected by error.

Looking around I found this equation $$y(x) = \operatorname{sgn}(\rho) \frac{\sigma_x}{\sigma_y}(x-\mu_x)+\mu_y$$ from the wikipedia pageMultivariate normal distribution that represent the Best Linear Unbiased Predictor for a multivariate normal distribution. This is indeed simmetric in x and y, but it returns the dark-red line in the figure, not bad, but nor what I would expect. I think it is due to the fact that the distribution is far from being normal.

Do you have any suggestion to find a good estimator for this problem?

Thank you very much

Luca

1 Answers 1

0

The labels "explanatory" and "observed" applicable for controlled trials data, if you are talking about two variables in uncontrolled "experiment", then the terms are not necessary as you basically assume that are both random variables and your aim is to estimate their conditional expectation. If you want a straight line like in the figure that you attached, then you are basically assuming that the conditional expectation of $X$ given $Y=y$ and/or vice versa has a linear form, i.e., $$ E[Y|X=x]=\beta_0 + \beta_1x. $$ For a bivariate normal distribution it has a special form of that you have written, otherwise the WLS/OLS/FGLS (depends on the assumptions, residuals diagnostics) will be still be BLUE given that the conditional mean is indeed a linear function. Note that the BLUE properties do not require Normality or even independent errors. Thus, if you don't have a good reason not to use the Linear Least Square, it should fit for the task.

  • 0
    What upsets me of OLS/WLS and (not sure) FGLS, is that the, as far as I know, error is evaluated along x or y. A single outlier can have a strong influence on the outcome of the OLS (and partly the WLS) depending on the direction you are measuring the error along.2017-02-06
  • 0
    check out robust regression: https://en.wikipedia.org/wiki/Robust_regression2017-02-06