15
$\begingroup$

For the Quadratic Form $X^TAX; X\in\mathbb{R}^n, A\in\mathbb{R}^{n \times n}$ (which simplifies to $\Sigma_{i=0}^n\Sigma_{j=0}^nA_{ij}x_ix_j$), I tried to take the derivative wrt. X ($\Delta_X X^TAX$) and ended up with the following:

The $k^{th}$ element of the derivative represented as

$\Delta_{X_k}X^TAX=[\Sigma_{i=1}^n(A_{ik}x_k+A_{ki})x_i] + A_{kk}x_k(1-x_k)$

Does this result look right? Is there an alternative form?

I'm trying to get to the $\mu_0$ of Gaussian Discriminant Analysis by maximizing the log likelihood and I need to take the derivative of a Quadratic form. Either the result I mentioned above is wrong (shouldn't be because I went over my arithmetic several times) or the form I arrived at above is not the terribly useful to my problem (because I'm unable to proceed).

I can give more details about the problem or the steps I put down to arrive at the above result, but I didn't want to clutter to start off. Please let me know if more details are necessary.

Any link to related material is also much appreciated.

4 Answers 4

36

Let $Q(x) = x^T A x$. Then expanding $Q(x+h)-Q(x)$ and dropping the higher order term, we get $DQ(x)(h) = x^TAh+h^TAx = x^TAh+x^TA^Th = x^T(A+A^T)h$, or more typically, $\frac{\partial Q(x)}{\partial x} = x^T(A+A^T)$.

Notice that the derivative with respect to a column vector is a row vector!

  • 0
    Could you comment on the difference expansion, please?2014-02-06
  • 1
    What do you mean? Just compute $Q(x+h)-Q(x)$ explicitly. The only term missing above is $h^T A h$, and we have $|h^T A h| \le \|A\| \|h \|^2$, so the term is $O(\|h\|^2)$.2014-02-06
  • 0
    Can't see how $(x+h)^T A (x+h)$ would be obvious; I was hoping avoid opening the matrix. Any hint?2014-02-07
  • 0
    I still don't understand what you are asking. Computing the derivative is much like computing the derivative of $x \mapsto x^2$ from first principles. I don't understand what you mean by 'opening the matrix'.2014-02-07
  • 1
    I don't see how I can expand $(x+h)^T A (x+h)$ so trivially. I mean literally, why $(x+h)^T A (x+h) = x^T A x + h^TAx+x^TAh + h^T A h$ and how can you see that so quickly. It just looks a messy summation for me.2014-02-07
  • 3
    There is no need to explicitly compute the sums. Matrix multiplication is associative and distributive, so we can treat them like 'numbers' in this regard. We have $A(x+h) = Ax + Ah$, $(x+h)^TA = (x^T +h^T) A = x^TA + h^T A$, etc.2014-02-07
  • 0
    How did you get $x^TA^Th$ from $h^TAx$?2016-07-07
  • 0
    In general, $(AB)^T=B^T A^T$.2016-07-07
  • 1
    And for a scalar $x^T = x$.2016-07-07
  • 0
    @copper.hat: thanks!2016-07-11
  • 0
    "the derivative with respect to a column vector is a row vector!" --this is of course assuming you're using [numerator layout](https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions)2016-10-14
  • 0
    @YiboYang: I think the above notation is fairly standard, I believe.2016-10-14
  • 0
    @copper.hat, thank you for your answers all of math stack exchange - I have used many of them. Now, I am wondering, how can you say that $\vert h^TAh \vert \le \vert\vert A \vert\vert \vert\vert h \vert\vert^2$ ? Is it somehow related to Cauchy Schwarz? I tried to derive it from Cauchy Schwarz but was unable to, because the L2-norm of a matrix is the largest singular value, and this threw me off a bit. Thanks again for your answers on stackexchange. Furthermore, how is the fact the it is of order $\vert h \vert^2$ enough to justify removing it?2017-09-12
  • 1
    @Sother: Cauchy Schwarz gives $|\langle h, Ah \rangle | \le \|h\| \|Ah\|$ and (if we use the Euclidean norm) we have $\|Ah\| \le \|A\| \|h\|$.2017-09-12
  • 0
    Thanks. also, I do not know if you saw my additional question - I editted it later. how is the fact the it is of order $\vert h \vert^2$ enough to justify removing it?2017-09-13
  • 0
    Also, what you are referring to when you say $\vert\vert A h \vert\vert \le \vert\vert A \vert\vert \, \vert\vert h\vert\vert$, you actually do not need it to be the Euclidea norm. *Holder's Inequality* allows that property to hold true for **any** norm! I just did not know that you could mix matrices and vectors when using Cauchy Schwarz, such as we have done here with matrix $A$ and vector $h$.2017-09-13
  • 1
    @Sother: The expression $\|Ax\| \le \|A\| \|x\|$ works for induced norms. However, it is irrelevant here in that it is always the case that $\|Ax\| \le K \|x\|$ for some $K$.2017-09-13
  • 0
    What is $K$ ? Is it a member of the set of real numbers? What set is it a member of?2017-09-13
  • 1
    It is a real constant. If the norm is induced it would be the norm of A.2017-09-13
  • 1
    I just learned a new trick when your independent variable is in more than two places within your formula: introduce a new (fake) parameter which will then disappear: $$\frac{\partial}{\partial x} y^TAx = \frac{\partial y}{\partial x}[Ax]^T+y^TA $$ The transpose was to make the vector a row vector. Nothing deep there! Now, if $y=x$ then $$ \frac{d}{dx} x^TAx = x^TA^T+x^TA = x^T(A+A^T) \ . $$2017-09-18
1

It is easier using index notation with Einstein (repeated sum on dummy indices) rule. That is, we can write the $i$th component of $Ax$ as $a_{ij} x_j$, and $x^T A x=x_i a_{ij} x_j = a_{ij} x_i x_j$. Then take the derivative of $f(\bf{x})$ with respect to a component $x_k$. We find \begin{eqnarray} \partial f/\partial x_k = f,_k = a_{ij} x_{i,k} x_j + a_{ij} x_i x_{j,k} = a_{ij} \delta_{ik} x_j + a_{ij} x_i \delta_{jk} = a_{kj} x_j + a_{ik} x_i, \end{eqnarray} which in matrix notation is $k$th component of ${\bf{x}}^T A + {\bf{x}}^T A^T$.

0

$f(x) = 0.5x^\top Ax \Rightarrow Df(x) = Ax $

  • 0
    This is not right. @copper.hat's answer is correct.2018-06-19
-1

I just learned a new trick when your independent variable is in more than two places within your formula: introduce a new (fake) parameter which will then disappear:

$$\frac{\partial}{\partial x} y^TAx = \frac{\partial y}{\partial x}[Ax]^T+y^TA $$ The transpose was to make the vector a row vector. Nothing deep there!

Now, if $y=x$ then $$ \frac{d}{dx} x^TAx = x^TA^T+x^TA = x^T(A+A^T) \ . $$