Take the example of trying to optimize a regression line:
$$ y = b + mx $$ around some data.
Method #1
You can do this by getting the partial derivatives of the error function:
$$ z = (1/2) \Sigma(f(x) - y)^2 $$ with respect to b and m, and then setting these equations to zero to find the stationary points.
Method #2
Use the gradient descent algorithm to find a local minima.
Question:
Method one 'seems' superior. Why couldn't you find at all the stationary points along the x and y axis (z is the error going upwards) and just pick the minimum values for both x and y? Why can't you do this and avoid the iterative process associated with gradient descent?
Where am I going wrong with my intuition..?