So I am trying to understand the Jacobian, as it relates to the transformation of random variables. The nuts and bolts are buried in calculus however.
Now, I have been reading this paper here, and I have some 'nested' questions so please bear with me.
Question 1) (Page 1): It says that:
"If h is differentiable, the approximation $h(x + dx) \approx h(x) + h'(x)dx$".
Why is this the case? I have 'accepted' it but I would like to know why it's true, although it does not affect question (2) directly for me.
Question 2: (Page 2): (This is my main question): Towards the bottom, the author uses a, b, c and d as placeholders for partial derivatives. I am clear as to everything as to how he got there, up until where he puts the a,b,c and d into that mini table with the arrows. I am convinced that in the table, he has mixed up b with c. For example, he has written that:
$ (x_1 + dx_1, x_2) \rightarrow (y_1 + a, y_2 + b) $
I am conivinced that the 'b' should actually be a 'c', going by his own definition. So what I think it should read instead is:
$ (x_1 + dx_1, x_2) \rightarrow (y_1 + a, y_2 + c) $
My reasoning is as follows. For the first part, ($y_1 + a$), we can get it as so:
$ h_1(x_1 + dx_1, x_2) \approx h_1(x_1,x_2) + \frac{\delta h_1(x_1,x_2)}{\delta x_1} dx_1 + 0 $
This yields, by his own definition:
$ h_1(x_1 + dx_1, x_2) \approx y_1 + a $
So far so good. So similarly, for the second coordinate of the transformed point, I reason as such:
$ h_2(x_1 + dx_1, x_2) \approx h_2(x_1,x_2) + \frac{\delta h_2(x_1,x_2)}{\delta x_1} dx_1 + 0 $
So this then, by his own definitions, must be:
$ h_2(x_1 + dx_1, x_2) \approx y_2 + c $
So why is it written as 'b' instead? b is as partial derivative of $h_1$ w.r.t $x_2$, whence in fact we are in actuality deriving $h_2$ w.r.t $x_1$. $x_1$ is the only thing that is varying, and the second coordinate of this point is a function of $h_2$. So what gives? Is this a typo in the paper or have I missed something completely?
Some context: I currently understand the random variable $y_1$ to be created as a linear combination of the r.v.'s $x_1$ and $x_2$, through transformation $h_1$. Similarly, r.v. $y_2$ is created as a linear combo of r.v.'s $x_1$ and $x_2$, through transformation $h_2$.