5
$\begingroup$

I believe the problem of trying to find the Jacobian of the following function highlights a lack of understanding of some concept on my part. I was hoping someone could either provide specific advice about solving this problem, or computing Jacobians in general.

Consider the mapping $h : \mathbb{R}^n \rightarrow \mathbb{R}^n$ where the domain is length-$n$ column vectors and the range length-$n$ row vectors (or a transposed vector, if you like). The function is $$h(x) = \frac{\eta v' + (M x)'}{(\eta + u'x)^2},$$ where the constants $v$ and $u$ are (column) vectors, $\eta$ is a scalar, and $M$ is a square matrix.

So far as I know, the quotient rule for vectors is $$\nabla\left(\frac{f}{g}\right) = \frac{g\nabla f - f \nabla g}{g^2}$$ and \begin{align*} \nabla f &= M'\\ \nabla g &= 2(\eta + u'x) u' \end{align*} Putting it all together, I get $$\nabla h = \frac{(\eta + u'x)^2 M' - [\eta v' + (M x)'] 2(\eta + u'x) u'}{(\eta + u'x)^4}.$$ This expression is clearly not right, and to see why evaluate the Jacobian at $x = \mathbf{0}$: $$\nabla h(0) = \frac{\eta^2 M' - 2\eta^2 v' u'}{\eta^4}$$ The resulting expression should be a $n \times n$ matrix, but in the second term we have two (row) vectors multiplied by one another. It seems likely there should be some sort of outer product here, but I'm not sure where my math is going wrong.

Any help you can provide is greatly appreciated.

  • 2
    What is a *vector quotient*?2011-01-19
  • 3
    Gradients are only defined for functions mapping vectors to _scalars._ The object you're looking for is called a Jacobian: http://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant2011-01-19
  • 0
    @Rasmus I simply meant the quotient rule when taking derivatives with respect to a vector.2011-01-19
  • 0
    @Qiaochu Yuan: My apologies, I got confused with nomenclature as this expression was arrived at by computing a gradient. I'm now trying to apply the same technique to this expression to compute the Hessian. Does the fact that I am computing essentially a vector of gradients (thus the matrix) imply the technique to compute it should be different?2011-01-19
  • 2
    You also need to think harder about matrix multiplication and transposes. You implicitly used the relation that for $f(x) = (Mx)^T$, $\nabla f = M^T$, which is incorrect. Recall that $\nabla_y f(x) = \lim_{h\to 0} \frac1h (f(x + hy) - f(x))$, where $y$ is a vector, you see that $\nabla_yf = (My)^T$. Key is the fact that $(Mx)^T \neq M^Tx$, but $(Mx)^T = x^T M^T$. In fact, why do you insist on having the output of the function $h$ be a row vector? This I think is the spot that is really tripping you up.2011-01-19
  • 0
    @willie Wong: h(x) is the gradient of a function. I am trying to compute the Hessian. Despite the error you have pointed out (which I will attempt to remedy), it seems there is still an unrelated problem in the second term. Can you elaborate on how $h$ being a row vector makes the process more difficult?2011-01-19
  • 0
    It doesn't make it more difficult. But the way you try to accommodate that fact in your notations is, I suspect, the cause of the confusion. Now, in your question you wrote that you expect the final answer to be a matrix: that is incorrect. If you think geometrically (since you take $h(x)$ to be a row vector because it is the gradient of a function), the Hessian takes as input a point $x$ and two (column) vectors $y,z$ and outputs a scalar, and hence, despite usual nomenclature, is not a matrix (at least not one interpreted as a linear transformation).2011-01-19
  • 0
    (In other words, the Hessian of a function is a $(0,2)$ [tensor field](http://en.wikipedia.org/wiki/Tensor), eating up two vectors and spitting out a scalar, while a linear transformation matrix is a $(1,1)$ tensor, which take a vector as an input and outputs a vector.) For your purpose, rather than all this, it is perhaps simplest to just consider $n$-tuples of numbers? I personally think that if you instead of trying to force the matrix notation, which is not natural in this context, use some sort of index notation, the answer would be a lot clearer.2011-01-19
  • 0
    @RandomGuy: if you use `$$...stuff...$$` instead of `$..stuff..$`, the result is a displayed math equation; it is centered in its own line, and the symbols are generally larger (much better for fractions, for example); it also avoids having to make new paragraphs.2011-01-20
  • 0
    @Arturo Magidin: Thanks for the tip. I will do that in the future.2011-01-20
  • 0
    you should consider that for $h$ is $h:{\Bbb{R}}^n\to{\Bbb{R}}$ then your problem solves easily... also edit the OP accordingly.2014-01-10

1 Answers 1

1

My suggestion for minimizing notational confusion is to focus on directional derivative, in the direction of some fixed vector $w$. This derivative $D_w h$ is always a function of the same nature as $h$: here, it will also eat column vectors and spit out row vectors. For the numerator the computation is easy since it's linear: $$D_w(\eta v' + (M x)') = (Mw)' \tag1$$ while $(\eta + u'x)^{-2}$ has, by the chain rule, $$D_w((\eta + u'x)^{-2}) = -2 (\eta + u'x)^{-3} D_w(\eta + u'x) =-2 (\eta + u'x)^{-3} (u'w) \tag2$$ Combine (1) and (2) by the product rule: $$D_w h = -2 (\eta + u'x)^{-3} (u'w) (\eta v' + (M x)') +(\eta + u'x)^{-2}(Mw)' \tag3 $$ As promised, the right hand side of (3) is a row vector. It depends linearly on $w$ and thus defines a $(1,1)$ tensor field.