2
$\begingroup$

I have read many of the questions already here in regards to the Frobenius norm, but they do not help me too much.

My question is, why is the Frobenius norm not considered a 'proper' norm?

In a book I am reading, (p.143), we have a square real matrix $\mathbf W$, that needs to be (iteratively) symmetrically orthogonalized. To start off, we first create a random matrix, and then normalize it as such:

$ \mathbf W = \mathbf W / ||(\mathbf W)||_{\; 1,\; or\; 2\; ,\; or \; \infty,\;\text{but not frobenius}} $

As the matrix becomes symmetrically orthogonalized during loops of the iteration, we will approach the condition that:

$ \mathbf {W^TW \approx I} $

Anyway, my book states the following:

...the norm in the normalization of $\mathbf W$ must be chosen appropriately. That is, it must be a proper norm in the space of matricies; most conventional norms have this property, except for the Frobenius norm.

Why is this the case? What makes it a 'proper' norm, vs the matrix $l_1$, $l_2$, or $l_\infty$ matrix norms? Why is $l_\text{Fro}$ a bad guy?

Thanks!

EDIT:

Here is the full text: Question is labelled in red:

enter image description here enter image description here

(Start reading from symmetric orthoginalization at near top of first page).

  • 0
    @copper.hat Ok, I have added full context. :-)2012-08-30

1 Answers 1

2

The requirement for convergence (from the article, no claim of correctness from me) seems to be that the eigenvalues of the positive semi-definite $W(1)^T W(1)$ must lie in the interval $[0,1]$.

To do this, they normalize $W(0)$ by dividing by its norm, ie, $W(1) = \frac{W(0)}{\|W(0)\|}$.

If $\| \cdot \|$ is an induced norm, the eigenvalues of $W(1)$ will satisfy $|\lambda| \leq 1$, and hence the eigenvalues of $W(1)^T W(1)$ will lie in the interval $[0,1]$ (since $\|W(1)\| = 1$, and $\|W(1)^T W(1)\| \leq \|W(1)\|^2 = 1$, and $W(1)^T W(1)$ is symmetric and positive semi-definite).

However, since the induced 2-norm satisfies $\|A\|_2 \leq \|A\|_F$, then as far as I can tell, choosing $W(1) = \frac{W(0)}{\|W(0)\|_F}$ would be fine too (since the eigenvalues of $W(1)$ will still lie in the unit disk).

It is not clear to me why the authors consider the Frobenius norm inappropriate in this case.

Addendum: I was curious to check convergence of both $W(t)^T W(t)$ and $W(t)$.

Let $p$ be the polynomial $p(x) = \frac{9}{4} x - \frac{3}{2} x^2 + \frac{1}{4} x^3$. We note that $W(t+1)^TW(t+1) = p(W(t)^TW(t))$. It follows (as was implicitly noted in the article above) that any basis that diagonalizes $W(t)^TW(t)$ will also diagonalize $W(t+1)^TW(t+1)$, hence to examine convergence, we need only look at the eigenvalues. By the spectral mapping theorem, if $\lambda_t$ is an eigenvalue of $W(t)^TW(t)$, then $\lambda_{t+1}=p(\lambda_t)$ is an eigenvalue of $W(t+1)^TW(t+1)$.

We want to show that if $\lambda_0 \in (0,1]$ is an eigenvalue of $W(0)^T W(0)$, then $\lambda_t \to 1$. We note that $p(x)=x$ has solutions $\{0,1,5\}$, and hence $p(x)>x$ when $x \in (0,1)$. Hence, if $x \in (0,1)$, then $x < p(x) < 1$. We also note that $p'(1) =0$. It follows that if $x_0 \in (0,1]$ and $x_{n+1} = p(x_n)$, then $x_n$ is a non-decreasing sequence, and $x_n \leq 1$. Hence $x_n \to \hat{x} \leq 1$, and by continuity $\hat{x} = p(\hat{x})$, hence $\hat{x} = 1$. It follows that $W(t)^TW(t) \to I$.

In fact, we can establish a rate of convergence, which will then be used to show that $W(t)$ converges. Since $p''(x) = \frac{3}{2}x -3$, we note that $|p''(x)| \leq 3$ for $x \in [0,1]$. Using a Taylor expansion, we have $|p(x)-(1+p'(1)(x-1))| = |p(x)-1| \leq \frac{1}{2} 3 |x-1|^2$. If $x \in [\frac{2}{3},1]$, then we have $|p(x)-1| \leq \frac{1}{2} |x-1|$. Consequently, if $x_0 \in [\frac{2}{3},1]$ and $x_{n+1} = p(x_n)$, we have $|x_{n}-1| \leq \frac{2}{3} \frac{1}{2^k}$. (The rate is actually quadratic, but I just want to establish convergence of $W(t)$.)

Since we have established that all eigenvalues of $W(t)^TW(t)$ converge to 1, we can assume, without loss of generality, that all eigenvalues of $W(0)^TW(0)$ lie in $[\frac{2}{3},1]$, and hence that $\|W(t)^TW(t)-I \| \leq K \frac{1}{2^t}$, for some constant $K$.

Now consider $W(t+1) = \frac{3}{2} W(t) - \frac{1}{2} W(T)W(t)^T W(t) = W(t) + \frac{1}{2} W(t) (W(t)^TW(t)-I).$

Note that since $W(t)^TW(t)$ converges, we have that $W(t)$ is bounded, hence $\|W(t)\| \leq L$, for some constant $L$. From this we obtain the estimate $\|W(t+1)-W(t) \| \leq \frac{1}{2} L K \frac{1}{2^t}.$

From this we have (assuming $n>m$): $\|W(n)-W(m)\| \leq \frac{1}{2} L K (\frac{1}{2^m}+ ... + \frac{1}{2^n}) \leq L K \frac{1}{2^m},$ so it follows that $\{W(t)\}$ is Cauchy, hence converges.

  • 0
    Thnx copperhat, I also changed the norm in my simulations from $l_\infty$ to $l_F$ and I was able to diagonalise a random matrix without the creating a black hole in the middle of the milky way. ;-)2012-08-31