I understand the basics of how dual numbers work, as well as how they are used for automatic differentiation, as described here: Dual Numbers & Automatic Differentiation
I was wondering, how would you extend this concept to get partial derivatives of a function? Basically I have a multivariable function, and I'd like to calculate it's value and gradient for a specific input.
I started off by looking at how multiplication of dual numbers is derived for a function of a single variable of the form $y=f(x)$. (Note the $\epsilon^2$ in the last step turns to 0 which makes that term disappear):
$(a+b\epsilon)*(c+d\epsilon) = \\ ac+(bc+ad)\epsilon+bd\epsilon^2 = \\ ac + (bc+ad)\epsilon$
That made me think that maybe I could just have an $\epsilon$ defined per variable in a $z=f(x,y)$ function, so I gave it a shot. (Note that the $x^2$ and $y^2$ terms disappear below for the same reason as above):
$ x=\epsilon_x \\ y=\epsilon_y \\ (a+bx+cy)*(d+ex+fy)= \\ ad+(ae+bd)x+(af+cd)y+(bf+ce)xy+bex^2+cfy^2= \\ ad+(ae+bd)x+(af+cd)y+(bf+ce)xy $
This looks pretty good except for the $xy$ term, which I have no idea how to account for in the gradient, or how to interpret.
Can anyone help me out towards understanding how to do multivariable automatic differentiation?