5
$\begingroup$

I have the following vector function $f(\mathbf{x})=\operatorname{Tr}[(\mathbf{A}+\operatorname{diag}(\mathbf{x}))^{-1}]$ where $\operatorname{diag}(\mathbf{x})$ is the diagonal matrix with values from $n\times 1$ vector $\mathbf{x}$ on the diagonal, and $\mathbf{A}$ is an $n\times n$ matrix (assume that $\mathbf{A}+\operatorname{diag}(\mathbf{x})$ is invertible). I know that one can express the trace as the sum of quadratic forms involving orthonormal basis vectors $\mathbf{e}_i$, thus we can also write $f(\mathbf{x})=\sum_{i=1}^n\mathbf{e}_i^T(\mathbf{A}+\operatorname{diag}(\mathbf{x}))^{-1}\mathbf{e}_i$.

I am interested in $\frac{\partial f}{\partial x_i}$. Is there a way to express it in terms of $\mathbf{A}$, $\mathbf{x}$ and $\mathbf{e}_i$?

4 Answers 4

7

Here is what you can try: Using the definition of partial derivative, consider $f(x+t e_i)-f(x)$ for some $t \in \mathbb{R}$. We have $ f(x+t e_i) = \text{Tr}((A + \text{diag}(x) + t e_i e_i^T)^{-1}).$ You can now apply matrix inversion lemma: http://en.wikipedia.org/wiki/Woodbury_matrix_identity You should be able to isolate $f(x)$ in the expansion and it should not be hard to get the desired partial derivative from the leftover.

EDIT: Let me elaborate some more. Let $\Gamma = A + \text{diag}(x)$. Apply Woodbury identity with $A = \Gamma$, $U = t e_i$, $C = 1$ (the scalar 1) and $V = e_i^T$. You get $ f(x + te_i) = \text{Tr} \Big( \Gamma^{-1} - t \frac{\Gamma^{-1} e_i e_i^T \Gamma^{-1}}{1 + t e_i^T \Gamma^{-1} e_i} \Big). $ Now, apply the trace to obtain (using invariance of trace under cyclic permutation), $ f(x + te_i) = f(x) + \frac{-t \,e_i^T \Gamma^{-2} e_i}{1+ t \,e_i^T \Gamma^{-1} e_i}. $ Hence, $ \frac{1}{t} [ f(x + te_i) - f(x) ] \to - e_i^T \Gamma^{-2} e_i $ as $t \to 0$, which is the desired partial derivative (if I haven't made any mistake.)

EDIT2: Let me also add this for a general problem. You can avoid Woodbury identity and instead use von Neumann expansion of $(I-B)^{-1} = I + B +B^2 + \dots$ for $\|B\| < 1$. So, in this problem for $t$ small enough, we have (using $(CD)^{-1}= D^{-1} C^{-1}$) \begin{align*} f(x + te_i) &= \text{Tr} [( I + t \Gamma^{-1} e_i e_i^T)^{-1} \Gamma^{-1} ]\\ &=\text{Tr} \big[ \big\{ I - t \Gamma^{-1} e_i e_i^T + o(t)\big\} \Gamma^{-1} \big] \\ &= \text{Tr} \big[ \Gamma^{-1} - t \Gamma^{-1} e_i e_i^T \Gamma^{-1} + o(t)\big] \\ &= f(x) - t (e_i^T \Gamma^{-2} e_i) + o(t) \end{align*} which is the desired result. This approach seems more general and more straightforward. It also produces the entire Taylor expansion of the function. (The previous one using Woodbury can also give you this, as you end up with a function of one variable $t$ as the leftover which can be expanded using usual Taylor series.)

  • 0
    It was really hard to decide which answer to accept here. In the end, generalization using the von Neumann expansion sealed this. Appreciate a good lesson!2012-06-10
5

$\def\e{{\bf e}} \def\D{\mathrm{diag}(\x)} \def\x{{\bf x}} \def\A{{\bf A}} \def\M{{\bf M}} \def\Mi{{\bf M}^{-1}} \def\P{{\bf P}_i} \def\id{\mathbb{I}} \def\tr{\mathrm{Tr}\,}$ Let $\M = \A + \D$. Note that $d(\M\Mi) = d \id = {\bf 0}$. Thus, $d\Mi = -\Mi d \M \Mi$, and so $\begin{eqnarray*} d \tr \Mi &=& d \sum \e_i^T \Mi \e_i \\ &=& \sum \e_i^T d \Mi \e_i \\ &=& -\sum \e_i^T \Mi d \M \Mi \e_i. \end{eqnarray*}$ If $\A$ is a function of $\x$, this is about as far as we'll go, $\begin{eqnarray*} \frac{\partial}{\partial x_i} \tr \Mi &=& -\sum_j \e_j^T \Mi \frac{\partial \M}{\partial x_i} \Mi \e_j \\ &=& -\tr \left((\Mi)^2 \frac{\partial \M}{\partial x_i}\right). \end{eqnarray*}$

If $\A$ is not a function of $\x$ we have that $\P = \frac{\partial \M}{\partial x_i}$ is a projection operator. (All components of $\P$ are zero except the $ii$th component, which is $1$.) In that case we find $\begin{eqnarray*} \frac{\partial}{\partial x_i} \tr \Mi &=& -\sum_j \e_j^T \Mi \P \Mi \e_j \\ &=& -\tr \left((\Mi)^2 \P\right) \\ &=& -(\Mi)^2_{ii} \\ &=& -\sum_j \Mi_{ij}\Mi_{ji}. \end{eqnarray*}$

In terms of $\A$, $\x$, and $\e_i$, $\begin{eqnarray*} \frac{\partial}{\partial x_i} f(\x) &=& -\sum_j \e_j^T (\A + \D)^{-1} \frac{\partial \D}{\partial x_i} (\A + \D)^{-1} \e_j \\ &=& -\sum_j \e_j^T (\A + \D)^{-1} \P (\A + \D)^{-1} \e_j. \end{eqnarray*}$ As noted by @passerby51 we could find an expression for $(\A + \D)^{-1}$ in terms of $\A^{-1}$ and $\D^{-1}$, but we will stop here.

Addendum: We made no assumptions about the basis $\e_i$. If $\e_i$ is the natural basis the above implies $\begin{eqnarray*} \frac{\partial}{\partial x_i} f(\x) &=& -\e_i^T (\A + \D)^{-2} \e_i, \end{eqnarray*}$ which agrees with @passerby51's result. (Note that in this case $\P = \e_i \e_i^T$.)

By the Woodbury formula, $\begin{eqnarray*} (\A+\D)^{-1} &=& \A^{-1} - \A^{-1}(\A^{-1} + \D^{-1})^{-1} \A^{-1}. \end{eqnarray*}$ This may or may not be useful depending on the form of $\A$.

  • 0
    @Hurkyl: Indeed! Cheers!2012-06-08
4

The solution to this problem has a rather elegant form
$ \eqalign { \frac{\partial{f}}{\partial{x}} = -\text{diag}([A + \text{Diag}(x)]^{-2}) } $

Let's denote the operation for turning a vector into a diagonal matrix as $W = Diag(w)$.

For the reverse operation (extracting the diagonal of a matrix as a vector) let's use $w = diag(W)$.

Next, define the third order tensor $\beta$, whose components are $\beta_{ijk} = 1$ when $i\!=\!j\!=k$ but are $0$ otherwise.

Now the diag/Diag operators can be expressed in a form which is more convenient for algebraic manipulation: $ \eqalign { \text{diag}(W) = \beta:W \cr \text{Diag}(w) = \beta\cdot w \cr }$ Now to derive the solution given above. Let $ \eqalign { M &= A + \beta\cdot x \cr W &= M^{-1} \cr f &= \text{tr}(W) \cr } $ Take differentials $ \eqalign { dM &= \beta\cdot dx = dM^T \cr dW &= -W\cdot dM\cdot W \cr df &= d[\text{tr}(W)] \cr &= \text{tr}(dW) \cr &= -\text{tr}(W\cdot dM\cdot W) \cr &= -\text{tr}(W^2\cdot dM) \cr &= -W^2 : dM^T \cr &= -W^2 : \beta\cdot dx \cr } $ So the derivative of $f$ is $ \eqalign { \frac {\partial f} {\partial x} &= -W^2 : \beta \cr &= -\text{diag}(W^2) \cr &= -\text{diag}([A + \text{Diag}(x)]^{-2}) \cr } $ If you truly desire the vector components, take the dot product with $e_k$ $ \eqalign { \frac {\partial f} {\partial x_k} &= -e_k\cdot\text{diag}([A + \text{Diag}(x)]^{-2}) \cr } $

0

Define the matrices $\eqalign{ X &= {\rm Diag}(x) &\implies x = {\rm diag}(X) \cr Y &= A+X &\implies dY = dX = dX^T \cr }$ Write the function in terms of these new variables, then find the differential and gradient. $\eqalign{ f &= {\rm tr}(Y^{-1}) \cr df &= -Y^{-2}:dY^T \cr &= -Y^{-2}:dX \cr &= -Y^{-2}:{\rm Diag}(dx) \cr &= -{\rm diag}(Y^{-2}):dx \cr \frac{\partial f}{\partial x} &= -{\rm diag}(Y^{-2}) \cr }$ where a colon denotes the trace/Frobenius product, i.e. $\,A:B^T={\rm tr}(AB)$