Say I have the following functions
- $ f(x) = Asin(Bx) $
- $ g(x) = M_1x $
- $ h(x) = M_2x $; where $M_2 \approx 0$ and $M_1 > 1000 M_2$
- $ z(x) = C $
- $ e(x) = N(0,\sigma) $
- $ m_g(x) = f(x) + z(x) - g(x) + e(x) $
- $ m_h(x) = f(x) + z(x) - h(x) + e(x) $
where the range of $ h(x) $ is approximately the same as $ f(x) $ but both are much less than the range of $ g(x) $ over the same x interval.
If I generate n samples of $m_g(x)$ and $m_h(x)$, using the same x locations and the same $e(x)$ values for each, and regress the data to get "best fit" slope values for $M_1$ and $M_2$, I noticed that the the result estimating $M_1$ is always SLIGHTLY more accurate (different at 4-5th decimal place). That is, regressing the much "steeper" $m_g(x)$ data points is more accurately estimating the true slope value ($M_1$ in this case) even when all other factors are kept the same.
Why is this?
As mentioned, it is a very, very small difference in accuracy between the two but $M_1$ always ends up more accurate (i.e. the error between the regressed slope and the simulated, "known" slope value is smaller).
I would like to know why, mathematically (that's why I'm here), this happens and how each function contributes (e.g. what would happen if $f(x)$ or $\sigma$ is scaled up? If $f(x)$ changes shape?).
I have a spreadsheet (LibreCalc) with my "simulated" data if anyone wants to see it.
Thanks in advance for the insights!