I'd like to ask why a Gaussian Kernel's Gram Matrix is of full rank. Lots of texts and articles always write about assuming this is the case, and refer me to an unavailable research article online, but I haven't been able to find a single source that sheds light on why this is the case.
Gaussian Kernels, Why are they full rank?
-
0I added in a "reference-request" tag – 2012-04-11
1 Answers
Let $K:X\times X\rightarrow\mathbb{R}$ be a kernel function on the set $X$. Then $K$ is positive definite if and only if the functions $K_x:=K(\cdot,x)$ are linearly independent. In particular for every finite set $\{x_1,\ldots ,x_n\}\subseteq X$ the matrix $K(x_i,x_j)$ then has full rank.
Thus in your case you have to show that the functions $e^{-\| x-x_k\|^2}$, $k=1,\ldots ,n$, are linearly independent over $\mathbb{R}$.
A linear relation
$ \sum\limits_{k=1}^n a_k e^{-||x-x_k||^2} =0,\; a_k\in\mathbb{R}, $
can be rewritten in the form
$ 0=\sum\limits_{k=1}^n (a_k e^{-||x_k||^2}) e^{-||x||^2} e^{2\langle x,x_k\rangle}=\sum\limits_{k=1}^n b_k e^{-||x||^2} e^{2\langle x,x_k\rangle}, $
where $b_k:=a_k e^{-||x_k||^2}$. Then:
$ \sum\limits_{k=1}^n b_k e^{2\langle x,x_k\rangle}=0, $
and since this equation holds for all $x\in\mathbb{R}^m$ one gets the homogenous system
$ \sum\limits_{k=1}^n b_k e^{2\langle jx,x_k\rangle}=0,\;j=0,\ldots ,n-1. $
For fixed $x$ is this is a Vandermonde system of equations for the $b_k$. Consequently $b_k=0$ and thus $a_k=0$ for all $k$ as desired.
-
0If $K$ is a kernel then it is positive definite without any additional assumptions. If for any distinct $x_1, ..., x_m$ functions $K(., x_1), ..., K(., x_m)$ are linearly independent, then kernel $K$ is _strictly_ positive definite. The converse is also true. – 2012-10-29
-
1It seems that we use different words for the same things: positive semidefinite means $x^tAx\geq 0$ for all $x$, positive definite means $x^tAx>0$ for all $x\neq 0$. A kernel function by definition is positive semidefinite. – 2012-10-29
-
0I don't get why the linear independence of the functions $K_x := K(\cdot,x)$ implies that the matrix $K(x_i,x_j)$ has full rank for **every** finite set $\{x_1,...,x_n\}$. Can you justify this statement a bit better? As far as I know, saying that the functions $K(x,x_1), ..., K(x,x_n)$ are linearly independent in $\mathbb{R}^m$ means that no linear combination of these functions (except the trivial one) is equal to zero **for all** $x$ in any interval in $\mathbb{R}^m$. Why does this imply that the matrix $K(x_i,x_j)$ has full rank? – 2017-02-15