In machine learning a very common technique to use as a training algorithm (in NN) is the gradient descent rule. I understand that it is an iterative process of increasing each of the weights based on the partial derivative. Why could we not simply take partial derivative of all weights, set up set of linear equations, and solve them? Is it the computational cost?
Gradient descent rule
-
0I'm even not sure that usage of gradient descend method is legal for such non-linear-system (cascading of linear and steps functions) from math point of view http://math.stackexchange.com/questions/1580425/is-it-legal-to-use-gradient-descend-method-in-neural-networks – 2015-12-17
1 Answers
If you do that you'll get a non-linear rather than a linear equation.
This is a common strategy for solving some optimization problems, but then that leads to finding a root of a nonlinear system of equations. This can be done using Newton's method (and generalizations), but this will generally involve dense matrix computations.
The dense matrix computations are the issue. Just setting up and solving the Newton's equations is costly (making a matrix will be O(n^2) without including the cost of computing the entries, and solving a matrix equation is O(n^3)).
Another issue in the NN context is online algorithms vs. batch algorithms. In that context it's much more common to use sequential gradient descent (SGD) than the standard gradient descent. (The
-
0but perceptron is linear, isn't it? – 2012-05-06