3
$\begingroup$

I'm new here, so "Hi" to everyone :D

I got the following problem. I have the matrices $A$, $B$, $C$, $X$ and $Y$. All matrices are square (say n-by-n). In particular: - $A$ is full rank - $B$ is symmetric and (semi)definite positive; - $C$ is diagonal and definite positive; - $Y$ is diagonal and definite positive; - $X$ is diagonal ($X = \operatorname{diag}\{x_1, \ldots,x_n\}$) and it is the unknown matrix;

Then I have the following function: $f(X) = (A(B+X^{T}YX)^{-1}A^{T} + C)^{-1}$ (it may seem dumb to write $X^{T}$ since it is diagonal, but I think this is the best way to write it).

I would like to evaluate the derivative of the trace of $f(X)$ with respect to each $x_i$.

Any idea?

  • 0
    You can refer to "Old and New Matrix Algebra Useful for Statistics" which is a good article about matrix calculus.2012-12-03

2 Answers 2

1

If we perturb an invertible matrix $M$ by a small $\Delta M$, the first-order change in $M^{-1}$ is given by $\Delta (M^{-1}) := (M+\Delta M)^{-1} - M^{-1} = -M^{-1} (\Delta M) M^{-1} + O(\|\Delta M\|^2)$. Now, consider $f(X) = (A(B+X^{T}YX)^{-1}A^{T} + C)^{-1}$. \begin{align} \Delta f(X) =&\Delta\left((A(B+X^{T}YX)^{-1}A^{T} + C)^{-1}\right)\\ \approx&-f(X)\ \Delta\left(A(B+X^{T}YX)^{-1}A^{T} + C\right) f(X)\\ =&-f(X)A \Delta\left((B+X^{T}YX)^{-1}\right) A^{T}f(X)\\ \approx&f(X)A(B+X^{T}YX)^{-1} \Delta\left(B+X^{T}YX\right) (B+X^{T}YX)^{-1}A^{T}f(X)\\ \approx&f(X)A(B+X^{T}YX)^{-1} \left((\Delta X)^{T}YX + X^TY\Delta X\right) (B+X^{T}YX)^{-1}A^{T}f(X). \end{align} Therefore \begin{align} \Delta\, \mathrm{trace}f(X) \approx&\mathrm{trace}\, f(X)A(B+X^{T}YX)^{-1} \left((\Delta X)^{T}YX + X^TY\Delta X\right) (B+X^{T}YX)^{-1}A^{T}f(X)\\ =&2\,\mathrm{trace}\, (\Delta X)^{T}YX (B+X^{T}YX)^{-1}A^{T}f(X)^2A(B+X^{T}YX)^{-1} \end{align} and in turn $ \frac{d\mathrm{trace}f(X)}{dX} = 2YX (B+X^{T}YX)^{-1}A^{T}f(X)^2A(B+X^{T}YX)^{-1}. $ This is the formula for a general square matrix $X$. For a diagonal $X$, simply take the diagonal of the above derivative.

  • 0
    Well, thank you for your precious help :D2012-12-04
1

Just use the chain rule, which states that $ \frac{\partial g(\bf{U})}{\partial X_{i,j}} = \operatorname{trace}\left(\left(\frac{\partial g(\bf{U})}{\partial \bf{U}}\right)^T\frac{\partial \bf{U}}{\partial X_{i,j}}\right). $

In your case, let $\bf{U} = A(B+X^{T}YX)^{-1}A^{T} + C$ and then $g(\bf{U}) = \operatorname{trace}(U^{-1})$. First, observe that $ \frac{\partial g(\bf{U})}{\partial \bf{U}} = -\bf{U}^{-T}U^{-T}. $ Next, we evaluate $\frac{\partial \bf{U}}{\partial X_{i,j}}$. \begin{align} \frac{\partial \bf{U}}{\partial X_{i,j}} &= \frac{\partial}{\partial X_{i,j}}(A(B+X^{T}YX)^{-1}A^{T} + C)\\ & = \frac{\partial}{\partial X_{i,j}}(A(B+X^{T}YX)^{-1}A^T )\\ & = A\left(\frac{\partial}{\partial X_{i,j}}(B+X^{T}YX\right)^{-1})A^{T}\\ & = -A(B+X^{T}YX)^{-1}\left(\frac{\partial}{\partial X_{i,j}}(B+X^{T}YX)\right)(B+X^{T}YX)^{-1}A^T \\ & = -A(B+X^{T}YX)^{-1}(X^TYJ^{ij} + J^{ji}YX)(B+X^T YX)^{-1}A^T \end{align} in which $(J^{ij})_{kl} = \delta_{ik} \delta_{jl}$, where $\delta$ denotes the Kronecker delta function. Note that in the above calculation I used \begin{align} &\frac{\partial}{\partial X_{i,j}}X^{T}YX = X^TYJ^{ij} + J^{ji}YX\\ &\frac{\partial}{\partial X_{i,j}}(Z(X))^{-1} = Z^{-1}\left(\frac{\partial}{\partial X_{i,j}}Z\right)Z^{-1} \end{align}

  • 0
    Last question... assuming that X is an arbitrary rectangular matrix m-by-n, B is m-by-m and all other matrices are n-by-n, is still possible to do all calculation you posted?2012-12-03