1
$\begingroup$

I'm trying to find $\frac{\partial}{\partial \lambda}y^T \left(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1}\right)^{-1}y$ where $y \in \mathbb{R^n}$ is fixed, $\lambda \in \mathbb{R}$ and $K_{\theta}^{-1}$ is a known symmetric, positive definite matrix. Here's what I did so far:

$\frac{\partial}{\partial \lambda}y^T (\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}y = \frac{\partial}{\partial \lambda}\text{tr}\left(y^T (\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}y\right)$ where tr denotes the trace. By the cyclic property of the trace, we can write $\frac{\partial}{\partial \lambda}\text{tr}\left(y^T (\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1} y\right) = \frac{\partial}{\partial \lambda}\text{tr}\left(y^T y(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1} \right)$ $ = \frac{\partial}{\partial \lambda}\sum y_i ^2\text{tr}\left( \sigma^2 I + \lambda^{-1}K_{\theta}^{-1}\right)^{-1} = \sum y_i ^2\text{tr}\left(\frac{\partial}{\partial \lambda}(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}\right)$

Since for any invertible matrix $M(\alpha)$ whose entries are differentiable in $\alpha \in \mathbb{R}$ it holds that $\frac{d}{d\alpha} M(\alpha)^{-1} = M(\alpha)^{-1}\left(\frac{d}{d\alpha} M(\alpha)\right) M(\alpha)^{-1}$ we have $\sum y_i ^2\text{tr}\left(\frac{\partial}{\partial \lambda}(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}\right) = \sum y_i^2 \text{tr}\left[ (\sigma^2 I + \lambda^{-1} K_{\theta}^{-1})^{-1}(-\lambda^{-2} K_{\theta}^{-1}) (\sigma^2 I + \lambda^{-1} K_{\theta}^{-1})^{-1}\right]$

I can simplify this to $-\sum y_i^2 \text{tr}\left[(\lambda\sigma^2 K_{\theta} + I)^{-1}(\lambda\sigma^2 I + K_\theta^{-1})^{-1}\right]$$ =-\sum y_i^2 \text{tr}\left[(\lambda^2\sigma^4 K_{\theta} + 2\lambda\sigma^2 I + K_{\theta}^{-1})^{-1}\right]$ but this is where I'm stuck as I can't analyse this expression analytically (or can I?). Is there any way to simplify this expression? I tried to use the Woodbury matrix identity on the latter matrix but to no success yet. Any help would be greatly appreciated.

2 Answers 2

1

Since $K$ is diagonalizable so $K=ADA^T$ where$ D=$diag$(d_1, \ldots, d_n)$.

So $(\delta^2 I + \lambda^{-1} K^{-1})^{-1} = A (\delta^2 I + \lambda^{-1} D^{-1})^{-1} A^T = A $diag$ ( ... , \frac{\lambda d_i}{ \lambda d_i\delta^2 +1 } , ... ) A^T$

So $\frac{\partial }{\partial \lambda} \frac{\lambda d_i}{ \lambda d_i\delta^2 +1 } =\frac{ d_i}{ (\lambda d_i\delta^2 +1)^2 } $

$\frac{\partial }{\partial \lambda} y^T (\delta^2 I + \lambda^{-1} K^{-1})^{-1} y = y^T A (\lambda^2\delta^2 D + D^{-1} +2\lambda \delta^2 I) A^T y = y^T( \lambda^2\delta^2 K + K^{-1} +2\lambda \delta^2 I) y$

Correction--------------------------------------------------------------------------------

So $\frac{\partial }{\partial \lambda} y^T (\delta^2 I + \lambda^{-1} K^{-1})^{-1} y = y^T A$ diag (..., $\frac{ d_i}{ (\lambda d_i\delta^2 +1)^2 } , ... ) A^T y$

Let $ T = A$ diag (..., $\frac{ d_i}{ (\lambda d_i\delta^2 +1)^2 } , ... ) A^T$. Then $T^{-1}=A$ diag ($..., \frac{ 2\lambda d_i\delta^2 + 1 + \lambda^2 d_i^2 \delta^4}{ d_i }, ... ) A^T = 2\lambda\delta^2 I + K^{-1} + \lambda^2 \delta^4 K $

So $\frac{\partial }{\partial \lambda} y^T (\delta^2 I + \lambda^{-1} K^{-1})^{-1} y = y^T( 2\lambda\delta^2 I + K^{-1} + \lambda^2 \delta^4 K)^{-1} y $

  • 0
    Thank you very much. I can't believe I didn't see that.2012-11-15
2

For notational convenience, define $\eqalign{ M &= \sigma^2 I + \lambda^{-1}K_{\theta}^{-1} \cr dM &= -\lambda^{-2}K_{\theta}^{-1}d\lambda \cr }$
Then write the function in terms of $M$ and the Frobenius product, and take its differential $\eqalign{ f &= y:M^{-1}y \cr &= yy^T:M^{-1} \cr\cr df &= yy^T:d(M^{-1}) \cr &= yy^T:M^{-1}\,(-dM)\,M^{-1} \cr &= yy^T:M^{-1}\,(\lambda^{-2}K_{\theta}^{-1}d\lambda)\,M^{-1} \cr &= yy^T:(\lambda^{2}MK_{\theta}M)^{-1} d\lambda \cr\cr }$ Now we can identify the derivative and expand it $\eqalign{ \frac{\partial f}{\partial\lambda} &= yy^T:(\lambda^{2}MK_{\theta}M)^{-1} \cr &= y^T(\lambda^{2}MK_{\theta}M)^{-1}y \cr &= y^T\big(\lambda^{2}\sigma^{4}K_{\theta} + K_{\theta}^{-1} + 2\lambda\sigma^2I\big)^{-1}y \cr }$