1
$\begingroup$

I'm trying to find $$\frac{\partial}{\partial \lambda}y^T \left(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1}\right)^{-1}y$$ where $y \in \mathbb{R^n}$ is fixed, $\lambda \in \mathbb{R}$ and $K_{\theta}^{-1}$ is a known symmetric, positive definite matrix. Here's what I did so far:

$$\frac{\partial}{\partial \lambda}y^T (\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}y = \frac{\partial}{\partial \lambda}\text{tr}\left(y^T (\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}y\right)$$ where tr denotes the trace. By the cyclic property of the trace, we can write $$\frac{\partial}{\partial \lambda}\text{tr}\left(y^T (\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1} y\right) = \frac{\partial}{\partial \lambda}\text{tr}\left(y^T y(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1} \right)$$ $$ = \frac{\partial}{\partial \lambda}\sum y_i ^2\text{tr}\left( \sigma^2 I + \lambda^{-1}K_{\theta}^{-1}\right)^{-1} = \sum y_i ^2\text{tr}\left(\frac{\partial}{\partial \lambda}(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}\right)$$

Since for any invertible matrix $M(\alpha)$ whose entries are differentiable in $\alpha \in \mathbb{R}$ it holds that $$\frac{d}{d\alpha} M(\alpha)^{-1} = M(\alpha)^{-1}\left(\frac{d}{d\alpha} M(\alpha)\right) M(\alpha)^{-1}$$ we have $$\sum y_i ^2\text{tr}\left(\frac{\partial}{\partial \lambda}(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}\right) = \sum y_i^2 \text{tr}\left[ (\sigma^2 I + \lambda^{-1} K_{\theta}^{-1})^{-1}(-\lambda^{-2} K_{\theta}^{-1}) (\sigma^2 I + \lambda^{-1} K_{\theta}^{-1})^{-1}\right]$$

I can simplify this to $$-\sum y_i^2 \text{tr}\left[(\lambda\sigma^2 K_{\theta} + I)^{-1}(\lambda\sigma^2 I + K_\theta^{-1})^{-1}\right]$$$$ =-\sum y_i^2 \text{tr}\left[(\lambda^2\sigma^4 K_{\theta} + 2\lambda\sigma^2 I + K_{\theta}^{-1})^{-1}\right]$$ but this is where I'm stuck as I can't analyse this expression analytically (or can I?). Is there any way to simplify this expression? I tried to use the Woodbury matrix identity on the latter matrix but to no success yet. Any help would be greatly appreciated.

2 Answers 2

1

Since $K$ is diagonalizable so $K=ADA^T$ where$ D=$diag$(d_1, \ldots, d_n)$.

So $(\delta^2 I + \lambda^{-1} K^{-1})^{-1} = A (\delta^2 I + \lambda^{-1} D^{-1})^{-1} A^T = A $diag$ ( ... , \frac{\lambda d_i}{ \lambda d_i\delta^2 +1 } , ... ) A^T$

So $\frac{\partial }{\partial \lambda} \frac{\lambda d_i}{ \lambda d_i\delta^2 +1 } =\frac{ d_i}{ (\lambda d_i\delta^2 +1)^2 } $

$\frac{\partial }{\partial \lambda} y^T (\delta^2 I + \lambda^{-1} K^{-1})^{-1} y = y^T A (\lambda^2\delta^2 D + D^{-1} +2\lambda \delta^2 I) A^T y = y^T( \lambda^2\delta^2 K + K^{-1} +2\lambda \delta^2 I) y$

Correction--------------------------------------------------------------------------------

So $\frac{\partial }{\partial \lambda} y^T (\delta^2 I + \lambda^{-1} K^{-1})^{-1} y = y^T A$ diag (..., $\frac{ d_i}{ (\lambda d_i\delta^2 +1)^2 } , ... ) A^T y$

Let $ T = A$ diag (..., $\frac{ d_i}{ (\lambda d_i\delta^2 +1)^2 } , ... ) A^T$. Then $T^{-1}=A$ diag ($..., \frac{ 2\lambda d_i\delta^2 + 1 + \lambda^2 d_i^2 \delta^4}{ d_i }, ... ) A^T = 2\lambda\delta^2 I + K^{-1} + \lambda^2 \delta^4 K $

So $\frac{\partial }{\partial \lambda} y^T (\delta^2 I + \lambda^{-1} K^{-1})^{-1} y = y^T( 2\lambda\delta^2 I + K^{-1} + \lambda^2 \delta^4 K)^{-1} y $

  • 0
    Thank you very much for your answer. I follow every computation until the second to last equation. I have $y^T A \text{diag}(\ldots,\frac{d_i}{(1+\lambda d_i \sigma^2)},\ldots) A^Ty = y^T A D(I + 2\lambda\sigma^2 D + \lambda^2 \sigma^4 D^2)^{-1} A^Ty = y^T A (D^{-1} + 2\lambda\sigma^2 I + \lambda^2\sigma^4 D)^{-1} A^Ty$. I fail to see where the inverse went off to ? Thanks in any case.2012-11-14
  • 0
    $y^T AD( 2\lambda\delta^2 D +I + \lambda^2 \delta^4 D^2)^{-1}A^T y = y^T [A D^{-1}( 2\lambda\delta^2 D + I + \lambda^2 \delta^4 D^2) A^T]^{-1} y =y^T [A( 2\lambda\delta^2 I +D^{-1} + \lambda^2 \delta^4 D) A^T ]^{-1} y = y^T [ 2\lambda\delta^2 I + K^{-1} + \lambda^2 \delta^4 K ]^{-1}y $2012-11-15
  • 0
    Thank you very much. I can't believe I didn't see that.2012-11-15
2

For notational convenience, define $$\eqalign{ M &= \sigma^2 I + \lambda^{-1}K_{\theta}^{-1} \cr dM &= -\lambda^{-2}K_{\theta}^{-1}d\lambda \cr }$$
Then write the function in terms of $M$ and the Frobenius product, and take its differential $$\eqalign{ f &= y:M^{-1}y \cr &= yy^T:M^{-1} \cr\cr df &= yy^T:d(M^{-1}) \cr &= yy^T:M^{-1}\,(-dM)\,M^{-1} \cr &= yy^T:M^{-1}\,(\lambda^{-2}K_{\theta}^{-1}d\lambda)\,M^{-1} \cr &= yy^T:(\lambda^{2}MK_{\theta}M)^{-1} d\lambda \cr\cr }$$ Now we can identify the derivative and expand it $$\eqalign{ \frac{\partial f}{\partial\lambda} &= yy^T:(\lambda^{2}MK_{\theta}M)^{-1} \cr &= y^T(\lambda^{2}MK_{\theta}M)^{-1}y \cr &= y^T\big(\lambda^{2}\sigma^{4}K_{\theta} + K_{\theta}^{-1} + 2\lambda\sigma^2I\big)^{-1}y \cr }$$