4
$\begingroup$

My question is at the bottom. (Most of the descriptive words come from Chris. Bishop's Neural Networks for Pattern Recognition)

Let $w$ be the weight vector of the neural network and $E$ the error function.

According to the Robbins-Monro algorithm, this sequence: $w_{kj}^{(r+1)}=w_{kj}^{(r)}-\eta\left.\frac{\partial E}{\partial w_{kj}}\right|_{w^{(r)}}$ will converge to a limit, for which: $\frac{\partial E}{\partial w_{kj}}=0.$

In general the error function is given by a sum of terms, each of which is calculated using one of the patterns from the training set, so that $E=\sum_nE^n(w)$ And in applications we just update the weight vector using one pattern at a time $w_{kj}^{(r+1)}=w_{kj}^{(r)}-\eta\frac{\partial E^n}{\partial w_{kj}}$

My question is: Why will the algorithm converge using the last formula? Once we use it to update the $w$, the value of $w$ is changed, and I can't prove the convergence using $\frac{\partial E}{\partial w_{kj}}=\sum_n \frac{\partial E^n}{\partial w_{kj}}$

  • 1
    I don't know anything about neural networks, but this looks a multidimensional Newton's method, see http://en.wikipedia.org/wiki/Kantorovich_theorem . Two common strategies to prove convergence of methods like this are 1) show the iteration is a contraction mapping, or 2) Find some quantity that decreases by at least a fixed fraction every iteration that is zero at the true solution. It will be difficult to help much more without knowing more properties of E.2012-04-08

0 Answers 0