Given a set of points $(x_i, y_i)$, least-squares linear regression finds the linear function $L$ such that $\sum \varepsilon(y_i, L(x_i))$ is minimized, where $\varepsilon(y, y') = (y-y')^2$ is the squared error between the actual $y_i$ value and the one predicted by $L$ from $x_i$.
Suppose I want to do the same, but I want to use the following penalty function in place of $\varepsilon$: $ \epsilon(y, L(x)) = \cases{y-L(x) & \text{if $y>L(x)+1$} \\ (y-L(x))^2 & \text{if $y\le L(x)+1$}}$
If the fitted line passes a certain distance below the actual data point, I want to penalize it much less severely than if the line passes the same distance above the data point. (The threshhold for using the cheaper penalty function is $y > L(x)+1$ rather than $y > L(x)$ so that overshooting by some amount $\delta<1$ is not penalized less than undershooting by the same amount—we don't want to overpenalize $L$ for undershooting by too little!)
I could probably write a computer program to solve this numerically, by using a hill-climbing algorithm or something similar. But I would like to know if there are any analytic approaches. If I try to follow the approach that works for the usual penalty function, I get stuck early on, because there seems to be no way to expand the expression $\epsilon(y_i, mx_i+b)$ algebraically.
I expect that the problem has been studied with different penalty functions, and I would be glad for a reference to a good textbook.