1
$\begingroup$

I would like to be shown how the curvature matrix K on page 7 in the paper "Regularisation Theory Applied to Neurofuzzy Modelling" (Bossley) is computed.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.3295

Any help would be greatly appreciated.

Thanks, Bruce

  • 0
    @Willie: You're right, of course; I hadn't properly understood the formula.2011-05-05

1 Answers 1

5

Okay, a bit of a guess. Writing $\mathbf{K} = \sum_{i = 0}^8\mathbf{k}_i\mathbf{k}_i^T$ we have, solving a system of linear equations, the following set of $\mathbf{k}_i$'s (defined up to $\pm$ sign):

$\begin{cases} \mathbf{k}_0^T &= [ 2, 0, 0, 0, 0, 0, 0, 0, 0 ] \\ \mathbf{k}_1^T &= [ 1, -1, 1, 0, 0, 0, 0, 0, 0 ] \\ \mathbf{k}_2^T &= [ 0, 0, 2, 0, 0, 0, 0, 0, 0 ] \\ \mathbf{k}_3^T &= [ 1, 0, 0, -1, 0, 0, 1, 0, 0 ] \\ \mathbf{k}_4^T &= [ 0, 1, 0, 1, -4, 1, 0, 1, 0 ] \\ \mathbf{k}_5^T &= [ 0, 0, 1, 0, 0, -1, 0, 0, 1 ] \\ \mathbf{k}_6^T &= [ 0, 0, 0, 0, 0, 0, 2, 0, 0 ] \\ \mathbf{k}_7^T &= [ 0, 0, 0, 0, 0, 0, 1, -1, 1 ] \\ \mathbf{k}_8^T &= [ 0, 0, 0, 0, 0, 0, 0, 0, 2 ] \end{cases}$

(Note that in the paper, $\mathbf{K}$ should be a symmetric matrix, but there is a typo in the matrix on page 7: the element in 3rd row, 6th column should be -1, like the element in the 6th row, 3rd column. In the paper in the spot it is 0 instead. )

This set is essentially unique if you consider $\mathbf{k}_i$ to be something like the second difference, in that only adjacent nodes on the lattice can influence each other.

Now, the expression for $\mathbf{k}_4^T$ quite clearly arose from the second difference formula on a planar square lattice. And one clearly gets $\mathbf{k}_{\{2,6,8\}}$ from $\mathbf{k}_{0}$ via symmetry transform of the lattice (by rotation), and similarly $\mathbf{k}_{\{3,5,7\}}$ from $\mathbf{k}_1$.

However, for the second difference formula, if the grid were finite, I would've written

$ \mathbf{k}_0^T = [-2,1,0,1,0,0,0,0,0] $

and

$ \mathbf{k}_1^T = [1, -3, 1, 0, 1, 0, 0, 0,0] $

which would change the diagonal elements to $[ 6, 12, 6, 12, 20, 12, 6, 12, 6]$ instead of what was given.

So I am guessing that some sort of unspecified boundary condition is imposed. Interestingly, this boundary condition seems to be compatible with the one that generates the $\mathbf{K}$ matrix on page 33 of the same document, this time for the one dimensional problem. (Normally for the second difference on the lattice of integers, you write $\Delta^2 f(x) = f(x+1) + f(x-1) - 2 f(x)$ Taking the cartesian product gives you the one for the planar lattice $\Delta^2 f(x,y) = f(x+1,y) + f(x-1,y) + f(x,y+1) + f(x,y-1) - 4 f(x)$ But if you impose the "boundary condition" that on the nonnegative integers, you define the second difference at any interior point $x \geq 1$ as above, but at the boundary $x=0$ as $\Delta^2 f(0) = f(0)$ then the cartesian product will give you that on an "edge" of the planar lattice with boundary, you have $\Delta^2 f(0,y) = f(0,y+1) + f(0,y-1) - f(0)$ and on a "corner" $\Delta^2 f(0,0) = 2f(0,0)$ which would agree with that $\mathbf{k}_i$ as written above. Now why this boundary condition is chosen I cannot tell you. The expression "I would've written" corresponds to the Neumann condition where on the boundary $\Delta^2f(0) = f(1) - f(0)$.)

Unfortunately this is as much as I can help. To say more will require me somehow reading the mind of the author.


Edit with more details as requested by OP

Let me re-write the mathematical notation, which I hope will clear up the issues a bit. The goal is to compute the "total squared curvature", which is used in some fashion to train the modeling scheme, which is used to approximate some data. We don't need to know what that is exactly. The point is that in your scheme, you are given a list of basis functions $\chi_0, \chi_1, \ldots, \chi_N$ which are centered at a bunch of (vector) points $\mathbf{x}_0, \mathbf{x}_1, \ldots , \mathbf{x}_n$. The output function $f(\mathbf{x}; w_0, w_1, \ldots, w_N)$ depends on a bunch of weights. And in fact for our purpose we consider the linear case

$ f(\mathbf{x};w_0, w_1, \ldots, w_N) = \sum_{j = 0}^N w_j \chi_j(\mathbf{x}) $

We define the total curvature of the function $f$ associated to the weights $w_0, \ldots, w_N$, to be the sum

$ \text{Total curvature}(w_0, w_1, \ldots, w_N) = \sum_{k = 0}^N \left| \frac{\partial^2}{\partial x_i \partial x_j} f(\mathbf{x_k}; w_0, w_1, \ldots, w_N ) \right|^2 $

The paper asserts that it is reasonable to ignore the off diagonal terms in the Hessian matrix, and so the quantity you want to compute is actually

$ \text{Total curvature redux}(w_0, w_1, \ldots, w_N) = \sum_{k = 0}^N \left| \triangle f(\mathbf{x_k}; w_0, w_1, \ldots, w_N ) \right|^2 $

where $\triangle$ is the Laplacian. The important thing is that the total curvature is computed from the output function $f$, yet is a function of the weights.

Now, going back to the representation of the function $f$ as a linear combination of the functions $\chi_j$ with coefficients $w_j$, it becomes clear that the derivatives of $f$ should also depend on $w_j$ linearly: at a fixed point $\mathbf{x}$, you have

$ \triangle f(\mathbf{x}; w_0, w_1, \ldots, w_N) = \sum_{j = 0}^N w_j [\triangle \chi_j(\mathbf{x})] $

(since $\mathbf{x}$ is fixed, $\triangle \chi(\mathbf{x})$ is just some number). Or, in matrix notation, we can write

$ \triangle f(\mathbf{x}; w_0, w_1, \ldots, w_N) = \begin{bmatrix} \triangle \chi_0 (\mathbf{x}) & \cdots & \triangle \chi_N(\mathbf{x}) \end{bmatrix} \begin{bmatrix} w_0 \\ \vdots \\ w_N \end{bmatrix} $

And so

$ |\triangle f(\mathbf{x}; \ldots)|^2 = \left(\begin{bmatrix} w_0 & \cdots & w_N \end{bmatrix}\begin{bmatrix} \triangle \chi_0 (\mathbf{x}) \\ \vdots \\ \triangle \chi_N(\mathbf{x}) \end{bmatrix}\right)\cdot \left( \begin{bmatrix} \triangle \chi_0 (\mathbf{x}) & \cdots & \triangle \chi_N(\mathbf{x}) \end{bmatrix} \begin{bmatrix} w_0 \\ \vdots \\ w_N \end{bmatrix}\right) $

Using the associativity of matrix multiplications, we can multiply the middle two things together first. To cleanup notation we write

$ \mathbf{k}(\mathbf{x}) = \begin{bmatrix} \triangle \chi_0 (\mathbf{x}) \\ \vdots \\ \triangle \chi_N(\mathbf{x}) \end{bmatrix} \qquad \qquad \mathbf{w} = \begin{bmatrix} w_0 \\ \vdots \\ w_N \end{bmatrix} $

we have that

$ |\triangle f(\mathbf{x}; \ldots)|^2 = \mathbf{w}^T (\mathbf{k}(\mathbf{x}) \mathbf{k}(\mathbf{x})^T) \mathbf{w} $

Now plugging this into the formula for the "total curvature redux". You see that what is written as $\mathbf{k}_i$ is nothing more than the vector $\mathbf{k}(\mathbf{x})$ evaluated at $\mathbf{x} = \mathbf{x}_i$.

In words, the $j$th entry of the vector $\mathbf{k}_i$ is "the 'curvature' (second derivative) of the basis function $\chi_j$ evaluated at the point $\mathbf{x}_i$", so that when multiplied by the weight $w_j$, it indicates "the contribution, made by the basis function $\chi_j$ at weight $w_j$, to the curvature of the output function $f$ at the point $\mathbf{x}_i$."

In the case where you are working over a discrete lattice, and the set $\mathbf{x}_0, \ldots \mathbf{x}_N$ exhausts all points in the domain, then one can show that $\mathbf{k}_i$ actually is a "matrix representation" of the second difference operator at the point $\mathbf{x}_i$. In the more general case where $\mathbf{x}_0, \ldots, \mathbf{x}_N$ do not exhaust all the points in the domain (as would be the case when working over a continuum), you would have to compute the $\mathbf{k}_i$ vectors using the method outlined above.

  • 0
    @Willie Wong. Thanks for the additional details. I now get it. I really appreciate your help.2011-05-11