0
$\begingroup$

I would appreciate some help on the following problem: I'm taking Hinton's coursera class on Neural Nets and I'm not sure I understand the step highlighted in the picture (see below).

Image

Background:

  • i is hidden layer
  • j is the top layer
  • Neurons use logistic regression as their activation function

What I understand:
The chain rule allows you to "break down" the partial derivative and introduce a term that is helpful for your calculation:

formula

What I don't understand:
Where does the formula come from? In other words, what's the proof that you can break down the left term into the sum of 3 components of the top layer (in this case).

Thanks for your help

Link to class: https://www.coursera.org/learn/neural-networks/lecture/gcNo6/the-backpropagation-algorithm-12-min

  • 0
    It's the chain rule for partial derivatives. Essentially, the derivative in the direction of a given vector is the sum of the derivatives of the components of the vector, weighted appropriately.2017-01-30
  • 0
    Hi ConMan. Thanks for your comment. I understand the chain rule. Can you explain the steps in between ? The chain rule only allows you to "slide in" one zj. Why the sum over j ?2017-01-30
  • 0
    It's a specific version of the chain rule for functions of multiple variables (or equivalently, for functions of vectors). The following link from Khan Academy shows a demonstration (although it looks at the total derivative of the main function, and it's only applied to a 2-variable case): https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/multivariable-chain-rule/v/multivariable-chain-rule2017-01-30
  • 0
    Thanks ConMan. It starts making sense... Quick question. I see that your link refers to ordinary derivative (whereas I had a partial derivative). How do you reconcile the 2?2017-01-30
  • 1
    @ConMan It is called the [total derivative](https://en.wikipedia.org/wiki/Total_derivative) : if $g(t) = f(x(t),y(t))$ then $\frac{\partial g}{\partial t} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial t}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial t}$2017-01-30
  • 1
    @GuillaumeG - I suggest you read up on the Jacobian matrix and the chain rule in higher dimensions. Essentially, the ordinary derivative applies when you've got a single underlying variable that you're differentiating with respect to, and the partial derivatives apply when you're only differentiating with respect to one variable out of many. The Wikipedia article on the chain rule, particularly the section on higher dimensions, explains it a bit better: https://en.wikipedia.org/wiki/Chain_rule#Higher_dimensions2017-01-30
  • 0
    Ok perfect.Thanks @ConMan and user1952009.2017-01-30

0 Answers 0