1
$\begingroup$

I understand the basics of how dual numbers work, as well as how they are used for automatic differentiation, as described here: Dual Numbers & Automatic Differentiation

I was wondering, how would you extend this concept to get partial derivatives of a function? Basically I have a multivariable function, and I'd like to calculate it's value and gradient for a specific input.

I started off by looking at how multiplication of dual numbers is derived for a function of a single variable of the form $y=f(x)$. (Note the $\epsilon^2$ in the last step turns to 0 which makes that term disappear):

$(a+b\epsilon)*(c+d\epsilon) = \\ ac+(bc+ad)\epsilon+bd\epsilon^2 = \\ ac + (bc+ad)\epsilon$

That made me think that maybe I could just have an $\epsilon$ defined per variable in a $z=f(x,y)$ function, so I gave it a shot. (Note that the $x^2$ and $y^2$ terms disappear below for the same reason as above):

$ x=\epsilon_x \\ y=\epsilon_y \\ (a+bx+cy)*(d+ex+fy)= \\ ad+(ae+bd)x+(af+cd)y+(bf+ce)xy+bex^2+cfy^2= \\ ad+(ae+bd)x+(af+cd)y+(bf+ce)xy $

This looks pretty good except for the $xy$ term, which I have no idea how to account for in the gradient, or how to interpret.

Can anyone help me out towards understanding how to do multivariable automatic differentiation?

  • 0
    I think there is a rule also for products of different epsilon's, or at least it would have to be defined.2017-02-17
  • 0
    That would make a lot of sense. I wonder what the rule might be? Hrm...2017-02-17
  • 0
    Actually... It seems like the x term is correct on its own and the y term is correct on its own. Maybe the rule is that two different epsilon's multiplied together are also zero?2017-02-17
  • 0
    Perhaps, I can see that being the case2017-02-18
  • 0
    Doing some tests it seems like this is the case, but makes me wonder if there's some cases I'm missing.2017-02-18
  • 0
    Ok I think I've figured it out but am trying to figure out how to word it correctly for an answer (if nobody beats me to it). Essentially, you could do multivariable AD by having a dual number for each variable you wanted to get the (partial) derivative for, and dealing with these dual numbers individual in isolation from each other. The work done for the real part of the dual numbers would be duplicated though, so you can just do that real part calculation once, then do each of the dual number parts individually. This works because partial derivatives treat the other variables as constants.2017-02-18
  • 0
    Sounds interesting, I've done auto diff. before, but we didn't use dual numbers to do so. We used them to solve a constrained linear optimization problem in 16 dimensions2017-02-18
  • 0
    That sounds on topic, what did you use for AD, matrices?2017-02-18
  • 1
    It's back in my notes that are buried somewhere on numerical analysis, if I find them I'll let you know. I've tagged the question as favorite so I can return to it2017-02-18

1 Answers 1

1

I find the notation somewhat confusing.

Let's restate. Let's say function is f(x,y)=xy. Then if you want df/dt (annotated f'), you'd pass f(x+x'e, y+y'e). You'd get the stated result f+f'e = xy + (x*y'+x'*y)*e.

Now a gradient, what it sounds like you're trying to evaluate is just a vector or partial derivatives.

Let's restate f(a,d)=a*d. You could do each separately passing only x derivatives and then with y. If you pass both, you get the equation you list. Now there's the real value and 3 different e terms. The 2 single e terms are the components of the gradient. The cross term on the other hand is the derivative relative to x and y, which is of no interest for the gradient, so xy can cancel like x^2 and y^2 do.

If you look at what you get out of the full expansion without canceling out e^2 anywhere, the terms are just multiple derivatives by the same or multiple variables. They only cancel because we decided to restrict ourselves to the first derivative. You could keep track of all these and only cancel when you get terms of even larger power (e.g. any multiple of 3 e's) and get different kinds of higher order derivatives.

Now this deserves more thorough derivation (probably just the standard doing math with d/dx_i tricks) and better presentation (mathy text) but it's late and I'm typing on a phone. :)

Thanks, Adrian