67
$\begingroup$

What are nice "proofs" of true facts that are not really rigorous but give the right answer and still make sense on some level? Personally, I consider them to be guilty pleasures. Here are examples of what I have in mind:

  1. $s=\sum_{i=0}^\infty \delta^i=1/(1-\delta)$.

    "Proof:" $s=1+\delta+\delta^2+\cdots=1+\delta s$ and hence $s=1/(1-\delta)$.

  2. $(f\circ g)'(x)=f'(g(x))g'(x)$.

    "Proof:" $\displaystyle\lim\limits_{h\to 0}\frac{f(g(x+h))-f(g(x))}{h}=\lim\limits_{h\to 0}\bigg(\frac{f(g(x+h))-f(g(x))}{g(x+h)-g(x)}\frac{g(x+h)-g(x)}{h}\bigg).$

  3. $[0,1]$ is uncountable.

    "Proof:" Pick a number from $[0,1]$ randomly. Every number has the same probability. If this probability were positive, there would be finitely many such numbers such that the probability of picking one of them exceeds $1$, which cannot be. So the probability of picking each number is $0$. If $[0,1]$ were countable, the probability of picking any real number would be $0=0+0+0+\cdots$. But by picking from a uniform distribution, I will get a real number with certainty.

It might be helpful to indicate where the lapses in rigor are and why the method works anyways.

  • 0
    Other than the "gives the right answer" part, haven't you defined politics?2012-04-02

14 Answers 14

49

Cayley-Hamilton Theorem. Let $A$ be an $n\times n$ matrix, and let $f(t)$ be its characteristic polynomial. Then $f(A)=0$.

"Proof." $f(t) = \det(A-tI)$. Therefore, $f(A) = \det(A-AI) = \det(A-A) = \det(0) = 0$.


Issues. One problem may not be obvious... the equation "$f(A)=0$" is really saying that the matrix we get via the evaluation (by identifying the underlying field with the subring of scalar matrices) is the zero matrix. However, the "proof" claims to prove that $f(A)$, which is supposed to be a matrix, is equal to the value of a determinant, which is a scalar.

As to why it "gives the right answers"... well, because the theorem is true. I am reminded of what Hendrik Lenstra once said in class after presenting an idea for a proof and explaining why it didn't quite work:

The problem with incorrect proofs to correct statements is that it is hard to come up with a counterexample.

  • 0
    This is where fussier notation is useful: if we write $\mathbf 0$ instead of $0$ for the zero matrix, then it's clearer that the "proof" hasn't arrived at the right result.2015-07-26
18

$\begin{align*} x^{x^{x^{\scriptstyle\ldots}}} &= 2\\ x^{\left(x^{x^{x^{\scriptstyle\ldots}}}\right)} &= 2\\ x^2 &= 2\\ x &= \sqrt{2}\\ \end{align*}$

  • 1
    @DeathkampDrone If you replace the exponent with the right hand side, you get $x^4 = 4$. Which has the same real solutions as $x^2 = 2$. Discarding the non-real and negative solutions of $x^4 = 4$, you are also left with $x = \sqrt{2}$. So the method shows that at least one of $x^{x^{x^{x^{x^{\dotsc}}}}} = 2 \text{ and } x^{x^{x^{x^{x^{\dotsc}}}}} = 4$ has no solution. There's no problem with the technique if you only use it for "if there is a solution, then it must be ...", but if one just assumes the existence of solutions, it leads to contradictions.2014-09-11
17

This is one that comes up on math.SE from time to time.

Proposition: $[0, 1]$ has the same cardinality as $[0, 1]^2$.

"Proof 1." Write $(x, y) \in [0, 1]^2$ in their binary expansions $x = 0.x_1 x_2 x_3 ..., y = 0.y_1 y_2 y_3 ...$, and consider the map which sends $(x, y)$ to $0.x_1 y_1 x_2 y_2 x_3 y_3 ...$.

Of course the problem with this proof is that the map is not well-defined since numbers do not have unique binary expansions, e.g. we have $0.1 = 0.0111...$.

"Proof 2." Repeat Proof 1, but this time don't allow any terminating binary expansions except $0.0$; instead, when a number has two binary expansions, pick the one which ends with infinitely many ones.

But now our map isn't surjective! The restriction on binary expansions above means, for example, that we can never hit $0.011010101...$.

A neat fix is just to show that $[0, 1]$ has the same cardinality as $\{ 0, 1 \}^{\mathbb{N}}$, which is straightforward to do by Cantor-Bernstein-Schroeder, and then the above argument actually works.

  • 0
    @QiaochuYuan: Yes, it's a nice exa$m$ple of that. Tha$n$ks.2012-03-23
14

Ito's Lemma: Assume that $x(t)$ has an SDE of the type $dx(t)=\mu(t)dt+\sigma(t)dW(t)$ for $\mu$ and $\sigma$ adapted processes. Also assume that $f\in C^{1,2}$. Define $z(t):=f(t,x(t))$, then $z(t)$ obeys the following SDE: $ df(t,x(t))=\left\{\frac{\partial f}{\partial t}+\mu\frac{\partial f}{\partial x}+\frac{1}{2}\sigma^2\frac{\partial^2f}{\partial x^2}\right\}dt+\sigma\frac{\partial f}{\partial x}dW(t) $ "Proof": Taylor expand to obtain $ df=\frac{\partial f}{\partial t}dt+\frac{\partial f}{\partial x}dx+\frac{1}{2}\frac{\partial^2f}{\partial x^2}dx^2+ \frac{1}{2}\frac{\partial^2f}{\partial t^2}dt^2+\frac{\partial^2f}{\partial t\partial x}dtdx. $ Then use that $(dt)^2\approx 0$, $(dW)^2=dt$ and $dWdt\approx 0$, yielding the result.

  • 7
    unless other "proofs" in this topic, the one you provided is quite commonly used in some books on financial math to convince students that the result holds2012-03-23
12

Pick's Theorem: Suppose $P$ is a polygon whose vertices are all lattice points. If a $P$ contains a lattice point $p$ then it contains the entire square around $p$, or nearly so. If it passes through a point $p$ then half of $p$'s square is inside $P$ and half outside, or nearly so. So if $i$ is the number of lattice points inside $P$, and $b$ the number of lattice points on the boundary of $P$, the area of $P$ will be $i + \frac b2$, plus or minus some fudge factor to adjust for that fact that even a small polygon already contains several lattice points.

A minimal square has $b=4$ and $i=0$, so the fudge factor is apparently -1. And in fact the area of any lattice polygon is indeed $i + \frac b2 -1$.

  • 0
    What is "fudge factor" but a pejorative version of "corrective term"?2018-07-12
12

Let $X$ be an algebraic data type, i.e. an element of a semiring. Then the type $L$ of lists of $X$s is

$L = 1 + XL$

i.e. a list is either empty ($1$) or it is a pair consisting of an $X$ and another list ($XL$). Then we can write

$(1-X)L = 1 \quad\Rightarrow\quad L = \frac{1}{1-X} \quad\Rightarrow\quad L = 1+X+X^2+X^3+\cdots$

i.e. a list is either empty, or it has one element, or it has two elements, or it has three elements, etc. Which is true, despite that fact that we used subtraction and division in our "proof", neither of which are valid operations in a semiring.

The fact that this "proof" works is surprisingly deep, and has ties to category theory. This paper by Fiore and Leinster proves the result, showing that for some cases, proofs that 'pretend' the objects under consideration are complex numbers give valid results for semirings, monoids, rings and other classes of algebraic structure. Among other things they show that there is a bijection between the type of binary trees $T$, and the type of seven-tuples of binary trees, $T^7$, i.e. that

$T=T^7$

which follows trivially from the defining equation for binary trees holding elements of type $X$

$T = 1 + XT^2$

if you allow yourself free access to field operations.

  • 0
    Thanks for the link to the Fiore and Leinster paper, which I am now enjoying.2014-09-11
8

$\begin{align*} f(x) &= \cos(x) + i\sin(x)\\ \frac{df}{dx} &= -\sin(x) + i\cos(x) = if(x)\\ \frac{1}{f(x)} \frac{df}{dx} &= i\\ \int\frac{1}{f(x)} df &= \int i dx\\ \ln(f(x)) &= ix + C\\ f(x) &= e^{ix + C} \end{align*}$ And since $f(0) = 1$, $C = 0$, so $e^{ix} = \cos(x) + i\sin(x)$

  • 0
    @Sellerie: It is, and in general you cannot manipulate $\frac{df}{dx}$ as though it were the division of two separate terms. However, just like you can "cancel" the terms of $\frac{dy}{dt}\frac{dt}{dx} = \frac{dy}{dx}$ _(due to the chain rule)_, you can also "cancel" the $dx$ in $\frac{df}{dx}$ by integrating with respect to x _(fundamental theorem of calculus)_. The validity of these steps are proven in Analysis course.2018-09-05
8

Euler's formula for plane connected graphs states that V + F - E = 2. Cauchy gave a proof of this that is very appealing but not "rigorous." For simplicity also assume the the plane graph G is 2-connected.

Suppose the graph G is embedded into the plane and consider the quantity: Q = V + F - E. If any face other than the infinite face is not a triangle, add internal diagonals using existing vertices to make that face a triangle.

It is easy to see that as each diagonal is added we increase F and E by 1 (V stays unchanged) but since these terms in Q have opposite signs Q remains unchanged. Now starting with the edges which bound the current plane graph (all internal faces triangles) remove triangles one at a time until one gets down to a single triangle. These reductions either involve removing a single edge (and Q will be unchanged by this operation) or removing two edges of a triangle that meet at a 2-valent vertex (and Q will be unchanged since V goes down by 1, E goes down by 2 and F goes down by 1).

Since after all reductions the value of Q is the same as at the start and the result is a triangle for which V + F - E = 2 holds, we can conclude that V + F - E = 2 for any such graph.

This way of proving Euler's polyhedral formula can be made rigorous but it requires more work to do it fully correctly.

  • 2
    There is a very nice book on the Euler's formula by Imre Lakatos: ["Proofs and Refutations. The Logic of Mathematical Discovery"](http://en.wikipedia.org/wiki/Proofs_and_Refutations). The author presents there many approaches (for some reasons false) to the proof of this formula. All of this serves as an example and a basis for a talk about logic, reasoning and philosophy of mathematics. I'm sure most of you know the book, to the rest I strongly recommend it.2012-04-02
8

Δ = difference operator Δf (x) = f(x+1)-f(x)

D = differentiation

so D = 1/ʃ and Δ = 1 / Σ.

Taylor's theorem: Δ = exp(D)-1

So: $\sum = 1/(\exp(D)-1) = (1/D)(D / (\exp(D)-1)) \\ = \int (1 - (1/2) D + (1/12)D^2 - (1/720)D^4 \ldots) \\ = \int + 1/2 + (1/12)D - (1/720)D^3 + ... $ This can actually be made rigourous, at the expense of some error terms and conditions on the functions f to which it is applied: you get the Euler-Maclaurin summation formula

  • 1
    This is very close to the formal justification given [here](http://www.whim.org/nebula/math/eulermac.html). In fact, for polynomials, this formula is exact.2012-03-24
7

Fourier Inversion Theorem For every $f \in L^1(\mathbb{R})$ such that $\hat{f}(\xi)=\int_{-\infty}^\infty f(x)e^{-i x \xi}\, dx$ also is in $L^1(\mathbb{R})$ the following formula holds:

$ f(x)=\int_{-\infty}^\infty \hat{f}(\xi) e^{i x \xi}\, \frac{d \xi}{2\pi}\qquad \text{a.e.}$

"Proof" We have

$\tag{1} \int_{-\infty}^\infty \hat{f}(\xi)e^{i x \xi} \, \frac{d\xi}{2\pi}=\int_{-\infty}^\infty\frac{d\xi}{2\pi} \int_{-\infty}^\infty dy\, f(y)e^{i\xi(x-y)} =\int_{-\infty}^\infty dy\, f(y)\left( \int_{-\infty}^\infty \frac{d\xi}{2\pi} e^{i \xi(x-y)}\right) $

and

$\tag{1!} \int_{-\infty}^\infty \frac{d\xi}{2\pi} e^{i \xi(x-y)}=\delta(x-y),$ $\tag{2!} \int_{-\infty}^\infty f(y)\delta(x-y)\, dy=f(x).$

Inserting $(1!)$ and $(2!)$ into $(1)$ we conclude the "proof". ////

The problem here lies in equations marked with (!), because they are meaningless as they stand. However, they may be interpreted as placeholders for rigorous approximation processes: for example, the proof goes on fine if we start with

$\int_{-\infty}^\infty \hat{f}(\xi)\Phi(h\xi)e^{i x \xi}\, \frac{d\xi}{2\pi}, $

where $\Phi$ is a suitable kernel and $h>0$, then mimic the above reasoning and let $h \to 0$ in the end.

6

"Proof" of the Gaussian distribution.

One can derive the (univariate) Gaussian from the binomial, but I think the usual and historical approach is to begin with "basic assumptions" and go from there. (This is a typical note.) We assume a "large number of errors, of equal magnitude, equally likely to be positive or negative."

These assumptions are meta-mathematical. As one author put it, "mathematicians...think physicists have verified it and physicists...think mathematicians have proved it theoretically." $^*$

The usual proof of the Gaussian is therefore not a proof at all, but a mathematical edifice constructed on intuition about random errors. Because the intuition is informed by experience, the idea works, is plausible, but it really doesn't qualify as a proof.

$^*$ Young, Statistical Treatment of Experimental Data, p65.

  • 0
    @robjohn: It is talking about the same function, yes, and I think my argument is overstated.2014-04-24
6

Central Limit Theorem (taken from my website)

In what follows, the fourier transform used is $ \widehat{f}(\xi)=\int_{-\infty}^\infty f(x)\,e^{-i2\pi x\xi}\;\mathrm{d}x $ Suppose we have a probability density function, $\phi$, which has a mean of $0$ and a variance of $1$ (this can be achieved by translation and scaling). Then, the following are true for the fourier transform of the pdf:

  1. $\widehat{\phi}(0)=1$ (i.e. $\int_{-\infty}^\infty\phi(x)\:\mathrm{d}x=1$; $\phi$ is a probability measure)

  2. $\widehat{\phi}\vphantom{\phi}^\prime(0)=0$ (i.e. $\int_{-\infty}^\infty (-i2\pi x)\phi(x)\:\mathrm{d}x=0$; the mean of $\phi$ is $0$)

  3. $\widehat{\phi}\vphantom{\phi}^{\prime\prime}(0)=-4\pi^2$ (i.e. $\int_{-\infty}^\infty (-i2\pi x)^2\phi(x)\:\mathrm{d}x=-4\pi^2$; the variance of the pdf is $1$)

Thus, to second order, the fourier transform of the pdf looks like $ \widehat{\phi}(\xi)=1-2\pi^2\xi^2\tag{1} $ The Central Limit Theorem looks at the sum of $k$ random variates contracted by $\sqrt{k}$ and scaled by $\sqrt{k}$. This maintains the unit integral while compensating for the increased variance due to summation.

The pdf of the sum of random variates is the convolution of the pdf of the variates. The fourier transform of a convolution is the product of the fourier transforms. The fourier transform of a function contracted by $\sqrt{k}$ and scaled by $\sqrt{k}$ is the fourier transform expanded by $\sqrt{k}$.

Thus, the fourier transform of the pdf of the sum of $k$ independent trials contracted and scaled appropriately is $ \left(1-\frac{2\pi^2\xi^2}{k}\right)^k\tag{2} $ which, for large $k$ approaches $ e^{-2\pi^2\xi^2}\tag{3} $ the inverse fourier transform of which is $ \frac{1}{\sqrt{2\pi}}e^{-x^2/2}\tag{4} $ The distribution in $(4)$ is gaussian normal distribution with mean $0$ and variance $1$.

5

Intuitive proof of equality area. enter image description here

  • 0
    On a 5x13 triangle the hypotenuse should never pass through a pair of integer coordinates.2018-12-30
0

“Proof” of Parseval’s Identity
Let $f\left(x\right)=\frac{a_0}{2}+\sum_{k=1}^{\infty}a_k \cos kx+b_k \sin kx$
Denote $T\left(x\right)= \sum_{k=1}^{\infty}a_k \cos kx+b_k \sin kx$, then
$f^2\left(x\right)=\frac{a_0^2}{4}+a_0 T\left(x\right)+T^2\left(x\right)$
$T^2\left(x\right)$ can be expressed as Cauchy Product
$T^2\left(x\right)=\sum_{k=1}^{\infty}\sum_{m=1}^{k}\left(a_m \cos mx+b_m \sin mx\right)\left(a_{k+1-m}\cos\left(k+1-m\right)x+b_{k+1-m}\sin\left(k+1-m\right)x\right)\\= \sum_{k=1}^{\infty}\sum_{m=1}^{k}a_m a_{k+1-m}\cos mx \cos\left(k+1-m\right)x +b_m b_{k+1-m}\sin mx \sin\left(k+1-m\right)x+a_{k+1-m}b_m\cos\left(k+1-m\right)x\sin mx+a_m b_{k+1-m}\cos mx \sin\left(k+1-m\right)x $
Taking integral from $-\pi$ to $\pi$ on both side, by the orthogonality of trigonometric functions, we note that all terms with product of unidentical trigonometric functions will become zero, therefore we have
$\int_{-\pi}^{\pi}T^2\left(x\right)dx=\sum_{k=1}^{\infty}\int_{-\pi}^{\pi}\left(a_k^2 \cos^2 kx+b_k^2 \sin^2 kx\right)dx\\=\pi\sum_{k=1}^{\infty}a_k^2+b_k^2$
And note that $\int_{-\pi}^{\pi}\frac{a_0^2}{4}dx=\frac{\pi a_0^2}{2}$ and $\int_{-\pi}^{\pi}a_0 T\left(x\right)dx=0$,
Thus we have finished proving the identity
$\frac{1}{\pi}\int_{-\pi}^{\pi}f^2\left(x\right)dx=\frac{a_0^2}{2}+\sum_{k=1}^{\infty}a_k^2+b_k^2$
It seems a beautiful proof, but unfortunately, it is TOTALLY WRONG. There are several fatal mistakes inside the proof.
1)Integral does NOT NECESSARILY commute with infinite sum, it is not available to directly take integral inside the sum $\sum_{k=1}^{\infty}$ without proving anything.
2)The Cauchy product of two convergent series NEED NOT to converge, unless you can prove the absolute convergence of one series, or the convergence of the Cauchy product series, which both seems impossible in this proof.
3)The Fourier series of $f\left(x\right)$ is NOT IDENTICALLY equal to $f\left(x\right)$. It even does not converge at some point EVEN IF $f$ is continuous! That means you should firstly prove the almost-everywhere convergence of Fourier series in $L^2$, which is a HUGE theorem.