6
$\begingroup$

I'm trying to minimize the (negative) multivariate normal log likelihood (dropping constants):

$ \log |\boldsymbol\Sigma|\,+(\mathbf{x}-\boldsymbol\mu)^{\rm T}\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu)$

where

$ \Sigma_{ij} = \sigma_1 \cdot \mathcal{I}\{i = j \} + \sigma_2 \cdot \mathcal{I}\{ L_i = L_j \} + \sigma_3 \cdot f(||L_i - L_j||)$ as a function of $\{ \sigma_1, \sigma_2, \sigma_3, {\boldsymbol \mu} \}$.

The $L_i$ are known values - in the context of the problem it is the "location" of unit $i$. The function $f(\cdot)$ is a known monotonically decreasing function. $\mathcal{I}(\cdot)$ is an indicator function and $|| \cdot ||$ is a distance measure (say, euclidean distance). Each of the $\sigma_i$ parameters are positive so that $\Sigma$ is certainly positive definite.

In this problem the dimension of $\Sigma$ is very large, say $1000 \times 1000$, so naively evaluating the log-likelihood is relatively computationally expensive ($\approx$ 2 seconds per evaluation). I'm thinking that, since $\Sigma$ has this blocked structure, there may be some way exploit this fact to significantly speed up computation, but I'm having some trouble figuring how/if this will work.

Any tips are appreciated.

Update: I can see that $\Sigma$ can be written as

$ \Sigma = \sigma_1 {\bf I}+ {\bf C} \otimes {\bf 1} $

where ${\bf 1}$ is a matrix of $1$s, ${\bf I}$ is the identity, $\otimes$ denotes the Kronecker product, and ${\bf C}$ is an $N \times N$ matrix, where $N$ is the number of "locations" and has the structure

$ {\bf C}_{nm} = \sigma_2 \cdot \mathcal{I} \{ n = m \} + \sigma_3 \cdot f( \delta_{nm} )$

where $\delta_{nm}$ denotes the distance between location $n$ and $m$. Still not quite sure this helps me a ton. Will be back with more updates perhaps.

  • 0
    I wouldn't think so because of all the between-feature correlation. Moreover, for such a large matrix, the chances of it being non positive-definite also increase with $p$, and such matrix pathologies can throw your entire system off -- so blocking won't help. I would firstly convert the covariance to correlation to get rid of scale, then hunt down pathologies (zero eigenvalues, etc.). The determinant is also in the pdf, so if there is high multicollinearity, it will induce more zero (near-zero) eigenvalues -- again, more pathologies in your matrix.2017-01-13

0 Answers 0