I'm trying to minimize the (negative) multivariate normal log likelihood (dropping constants):
$ \log |\boldsymbol\Sigma|\,+(\mathbf{x}-\boldsymbol\mu)^{\rm T}\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu)$
where
$ \Sigma_{ij} = \sigma_1 \cdot \mathcal{I}\{i = j \} + \sigma_2 \cdot \mathcal{I}\{ L_i = L_j \} + \sigma_3 \cdot f(||L_i - L_j||)$ as a function of $\{ \sigma_1, \sigma_2, \sigma_3, {\boldsymbol \mu} \}$.
The $L_i$ are known values - in the context of the problem it is the "location" of unit $i$. The function $f(\cdot)$ is a known monotonically decreasing function. $\mathcal{I}(\cdot)$ is an indicator function and $|| \cdot ||$ is a distance measure (say, euclidean distance). Each of the $\sigma_i$ parameters are positive so that $\Sigma$ is certainly positive definite.
In this problem the dimension of $\Sigma$ is very large, say $1000 \times 1000$, so naively evaluating the log-likelihood is relatively computationally expensive ($\approx$ 2 seconds per evaluation). I'm thinking that, since $\Sigma$ has this blocked structure, there may be some way exploit this fact to significantly speed up computation, but I'm having some trouble figuring how/if this will work.
Any tips are appreciated.
Update: I can see that $\Sigma$ can be written as
$ \Sigma = \sigma_1 {\bf I}+ {\bf C} \otimes {\bf 1} $
where ${\bf 1}$ is a matrix of $1$s, ${\bf I}$ is the identity, $\otimes$ denotes the Kronecker product, and ${\bf C}$ is an $N \times N$ matrix, where $N$ is the number of "locations" and has the structure
$ {\bf C}_{nm} = \sigma_2 \cdot \mathcal{I} \{ n = m \} + \sigma_3 \cdot f( \delta_{nm} )$
where $\delta_{nm}$ denotes the distance between location $n$ and $m$. Still not quite sure this helps me a ton. Will be back with more updates perhaps.