23
$\begingroup$

These days, the standard way to present differential calculus is by introducing the Cauchy-Weierstrass definition of the limit. One then defines the derivative as a limit, proves results like the Leibniz and chain rules, and uses this machinery to differentiate some simple functions such as polynomials. The purpose of my question is to see what creative alternatives people can describe to this approach. The nature of the question is that there is not going to be a single best answer. I have several methods that I've collected which I'll put in as answers to my own question.

It's not reasonable to expect answers to include an entire introductory textbook treatment of differentiation, nor would anyone want to read answers that were that lengthy. A sketch is fine. Lack of rigor is fine. Well known notation and terminology can be assumed. It would be nice to develop things to the point where one can differentiate a polynomial, since that would help to illustrate how your method works and demonstrate that it's usable. For this purpose, it suffices to prove that if $n>0$ is an integer, the derivative of $x^n$ equals $0$ at $0$ and equals $n$ at $1$; the result at other nonzero values of $x$ follows by scaling. Doing this for $n=2$ is fine if the generalization to $n>2$ is obvious.

  • 0
    One may see [this answer](http://math.stackexchange.com/a/2079403/272831) for an interesting approach.2017-01-01

9 Answers 9

5

Definition:
Given a function $x(t)$, consider any point $P=(a,x(a))$ on its graph. Let the function $\ell(t)$ be a line passing through $P$. We say that $\ell$ cuts through $x$ at $P$ if there exists some real number $d>0$ such that the graph of $\ell$ is on one side of the graph of $x$ for all $a-d < t < a$, and is on the other side for all $a < t < a+d$.

Definition (Marsden):
A line $\ell$ through $P$ is said to be the line tangent to $x$ at $P$ if all lines through $P$ with slopes less than that of $\ell$ cut through $x$ in one direction, while all lines with slopes greater than $P$'s cut through it in the opposite direction.

Definition:
The derivative of a function is the slope of its tangent line at a given point.

Theorem (Livshits):
The derivative of $t^k$ is $kt^{k-1}$, for $k=1, 2, 3, \ldots$

It suffices to prove that the derivative equals $k$ when evaluated at $t=0$ and $1$. The result at $t=0$ holds for even $n$ by symmetry, and for odd $n$ by application of the definition.

It remains to prove the result at $t=1$. The proposed tangent line at $(1,1)$ has the equation $\ell(t)=k(t-1)+1$, so what we need to prove is that the polynomial $t^k-[k(t-1)+1]$ is greater than or equal to zero throughout some region around $t=1$. We will prove that it is $\ge 0$ for $t \ge 0$.

Suppose that $\ell$ crosses $t^k$ at the point $(t,t^k)$. Then the slope of $\ell(t)$ is $k$, so we must have \begin{equation*} \frac{t^k-1}{t-1} = k. \end{equation*} The left-hand side is given by $Q(t)=\sum_{j=0}^{k-1}t^j$. Where do we get $Q(t)=k$? Clearly we have a solution for $t=1$, since there are $k$ terms, each equal to $1$. For $t>1$, all the terms except the constant one are greater than $1$, so there can't be any solution. For $0 \le t < 1$, all the terms except the constant one are positive and less than $1$, so again there can't be any solution. This completes the proof.

References

4

When I taught Number Theory I needed to speak of the derivative for a polynomial (over $\mathbb Z$ and $\mathbb Z/p \mathbb Z$). Instead of taking the derivative from $\mathbb R$ and restrict it to $\mathbb Z$, I used the following approach, which works for polynomials only (but it would work in any polynomial ring).

Let $P(X)$ be a polynomial, and $a$ a point. Then by the division Theorem we have

$P(X)=(X-a)Q(X) +R \,.$

where $R$ is a constant. We define

$P'(a):= Q(a) \,. \quad (*)$

It is important though to point that $Q(X) \neq P'(X)$ in general, since at different points we get different $Q's$.

The following Lemma is an immediate consequence of $(*)$:

Lemma

1) $(P_1 \pm P_2)' =P_1'+P_2'$,

2) $(aP)'=aP'$

3) $(a)'=0$

4) $(X^n)'=n X^{n-1}$.

Thus, one gets the general formula for the derivative of a polynomial.

The product rule can also be proven relatively easy, and then one can actually prove that

$P(X)=P(a) + P'(a)(X-a)+ \frac{P''(a)}{2!}(x-a)^2+...+ \frac{P^{(n)}(a)}{n!}(x-a)^n \,,$

where $n$ is the degree of the polynomial.

It also follows from here that $a$ is a multiple root of $P(X)$ with multiplicity $k$ if and only if $P(a)=P'(a)=...=P^{(k-1)}(a)=0$ and $P^{(k)}(a) \neq 0$.

This is a purely algebraic approach, it works nicely for polynomials in any rings, and can probably be easily extended to rational functions, but not much more generally.

Note that $R=P(a)$, thus for all $x \neq a$ we have $Q(X)=\frac{P(X)-P(a)}{x-a}$, thus this definition is equivalent to the standard definition in $\mathbb R$.

Also, note that $P''(a) \neq Q'(a)$ in $(*)$. Actually, from the product rule one gets $P''(a) =P'(a)+Q'(a)$.

  • 0
    The Taylor expansion beyond the first derivative does not work in all rings: the factorials $2!,\dots,n!$ need to be invertible in the ring. You'd especially have problems over ${\mathbf Z}/(p)$ if $n \geq p$.2013-06-30
3

Here's a characterization of derivatives of polynomials in algebra rather than calculus, which might be . . . . um . . . . tangentially . . . . relevant.

A linear mapping from polynomials in $x$ to polynomials in $x$ is determined by the images of $1,x,x^2,x^3,\ldots$. If such a mapping takes $x^n$ to a scalar multiple of $x^{n-1}$, then it is shift-equivariant only if it takes $x^n$ to $nx^{n-1}$.

Later note: It is suggested in the comments below that some people don't know what "shift-equivariant" means. If $g(x)=f(x-c)$, then $g$ is a shift of $f$. Suppose $Tf$ is the image of $f$ under a linear mapping of the kind contemplated here. If $(Tf)(x-c)=(Tg)(x)$ for all $f$ and all scalars $c$, with $g$ as above, then $T$ is shift-equivariant. Differentiation is shift-equivariant; some linear mappings are not (e.g. multiplication by $x$).

  • 0
    Yes, I understand that. I'm saying that when applying this to calculus on the reals, it would be nice to have some way o$f$ proving that rather than taking it as a hypothesis.2012-10-06
3

If I understood the question correctly, I find odd no one has yet posted regarding Non-standard analysis. Of course I didn't create this, but I like the idea a lot better than using that $\varepsilon$-$\delta$ unintuitive definition. There is an awesome book about it. Following it, consider the hyperreals, with all nice properties like the extension and transfer principles.

The derivative is the instant variation of a function. Given that the function is continuous, its derivative can be approximated by some real $\Delta x$, which will give a better approximation the smaller it is. Also, take $y=f(x)$.

$ f'(x)\approx\frac{\Delta y}{\Delta x}= \frac{f(x+\Delta x)-f(x)}{(x+\Delta x)-x} $

Now if $\Delta x$ is infinitely small, smaller than all positive reals, but not zero, the real part resulting will be exactly the derivative of the function. This real part, "zooming out of infinitesimals", is given by the standard function ($\text{st}:\mathbb R^*\to\mathbb R$).

$ f'(x)=\text{st}\frac{\Delta f(x)}{\Delta x}=\text{st}\left(\frac{f(x+\Delta x)-f(x)}{\Delta x}\right) $

Let's take the derivative of the function $y=x^2-3x$.

$ \begin{align} y+\Delta y &= (x+\Delta x)^2-3(x+\Delta x)\\ \Delta y&=(x+\Delta x)^2-3(x+\Delta x)-(x^2-3x)=\\ &=\color{red}{x^2}+2x\Delta x+\Delta x ^2 \color{red}{-3x}-3\Delta x\color{red}{-x^2+3x}=\\ &=2x\Delta x -3\Delta x+\Delta x ^2\\ f'(x)=\text{st}\frac{\Delta y}{\Delta x}&= \text{st}(2x -3+\Delta x)=2x-3\\ \end{align} $

This $\Delta y$ is not the variation of the derivative though. To illustrate this, I took the following from the book:

different degrees of infinitesimals

If we were to draw a tangent on a point using the derivative we got and infinitely amplify the section, it would be off by an even infinitely smaller bit. We'd need to amplify further to notice it. That variation on the tangent instead of the curve itself is called the differential of $y$, defined by:

$ dy=f'(x)\ dx $

This is useful to make the derivative a (hyper)real fraction. $dx=\Delta x$, it is used to not break the conventional notation. So in this case

$ dy=(2x-3)dx\neq(2x-3+dx)dx=\Delta y,\ \ (\epsilon=dx) $

An advantage of this fraction definition of derivative is many rules become obvious. Say, the inverse function rule, for infinitesimals $dy\neq0\neq dx$:

$ \forall a\ \forall b:\{a;b\}\subset \mathbb R^*\setminus\{0\}, \frac{a}{b}=\frac{1}{b/a}\\ \therefore\frac{dx}{dy}=\frac{1}{dx/dy} $

Many properties of derivatives are quite easier to prove using this form of Calculus (so I will not list them here, but every single one used in that book is also proved there). I hope this is the kind of answer you were looking for.

2

The following is meant to be an axiomatization of differential calculus of a single variable. To avoid complications, let's say that $f$, $g$, $f'$, and $g'$ are smooth functions from $\mathbb{R}$ to $\mathbb{R}$ ("smooth" being defined by the usual Cauchy-Weierstrass definition of the derivative, not by these axioms, i.e., I don't want to worry about nondifferentiable points right now). In all of these, assume the obvious quantifiers such as $\forall f \forall g$.

Axiom Z: $\exists f : f'\ne 0$

Axiom A: $(f+g)'=f'+g'$

Axiom C: $(g \circ f)'=(g'\circ f)f'$

A bunch of the following is my presentation of reasoning given in a post by Tom Goodwillie: https://mathoverflow.net/questions/108773/independence-of-leibniz-rule-and-locality-from-other-properties-of-the-derivative/108804#108804 This whole answer is a shortened and cleaned up presentation of what was worked out in that MO question.

Theorems:

(1) The derivative of the identity function $I$ is 1. -- Applying axiom C to $I=I\circ I=I\circ I\circ I$ shows that $I'$ is equal to either 0 or 1 everywhere. Since continuity is assumed, $I'$ has the same value everywhere. By Z and C, that value can't be 0.

(2) The derivative of a constant function is 0. -- From A and (1) we can show that the derivative of $-I$ is $-1$. Composition of the constant function with $-I$ then shows that the derivative of the constant is 0, evaluated at 0.

(3) The derivative of $cx$, where $c$ is a constant, is $c$. -- By pre- or post-composing with a translation, we see that the derivative must be a constant $h(c)$. The function $h$ is a homomorphism of the reals with $h(1)=1$, so $h=c$.

(4) The derivative of $cf$, where $c$ is a constant, is $cf'$. -- This follows from (3) and C.

(5) The derivative of an even function at 0 is 0. -- Axiom C.

(6) The derivative of $s(x)=x^2$ is $2x$. -- Let $u(x)=s(x+1)-s(x)=2x+1$. Then $u'=2$. By (5), $s'(0)=0$. Therefore $s'(1)=2$. Precomposition with a scaling function then establishes the result for all $x$.

(7) For any functions $f$ and $g$, $(fg)'=f'g+g'f$. -- Write $2fg=(f+g)^2-f^2-g^2$ and apply (6).

2

Start with polynomials only. Given a polynomial $p(x) = \sum_{i=0}^n a_ix^i$ and a point $x_0$, assure yourself that there is another polynomial $\tilde{p}(x) = \sum_{i=0}^n b_ix^i$ such that $ p(x) = \tilde{p}(x-x_0). $

Now observe what happens if you evaluate $\tilde{p}(x-x_0)$ for an $x$ that lies close to $x_0$. $(x-x_0)^i$ will then decay very rapidly, so $\tilde{p}(x-x_0)$ won't differ very much from $b_0 + b_1(x-x_0)$. In other words, $b_0 + b_1(x-x_0)$ is a good approximation of $p$ as long as we don't stray too far from $x_0$. Now, we just have to actually find $b_0$ and $b_1$.

$b_0$ is obviously just $p(x_0)$, so what remains is to find $b_1$, i.e. the coefficient of $x$ in $p(x+x_0)$. Once you realize that that expanding $(x+c)^k$ produces $k$ times the term $xc$ and that no other term contains exactly one $x$, it is clear that $ b_1 = a_1 + a_22x_0 + a_33x_0^2 + a_44x_0^3 + \ldots $

The first-order approximation of $p$ around $x_0$ is thus $p(x_0) + x\sum_{i=1}^n a_iix_0^{i-1}$ which makes it obvious that the slope of $p(x)$ around $x_0$ is $ p'(x) = \sum_{i=1}^n a_iix_0^{i-1} $

The key ingrediate (and the replacement for explicit limits) is the idea that for small values $\epsilon$, $\epsilon^2$ and higher power are sufficiently close to zero to be ignored.

1

This is not really a different approach, it's only rewriting the Cauchy-Weierstrass definition.

Lets say $~f: \mathbb R \rightarrow \mathbb R$ is a smooth function and take $t_0 \in \mathbb R$. Then the slope $m$ of $f$ at the point $t_0$, or more precisely the tangent line $T(t)$ of $f$ at $t_0$, is uniquely characterized by the property that $|f(t) - T(t)| = o( |t-t_0|)$.

Now assume $F(t)$ is a parametrization of the graph of $f(t)$, for instance $F(t)= (t,f(t))\in \mathbb R^2$, or if we want to be more general let $F(t)$ be an arbitrary smooth curve in $\mathbb R^2$. Let $(I,G)$ denote the pair given by an interval $I\subseteq \mathbb R$ with $t_0\in I$ and $G: I\rightarrow \mathbb R^2$ a smooth curve with $G(t_0)= F(t_0)$. Then one can show that $F$ and $G$ are tangental if and only if $||G(t)- F(t)|| = o(|t-t_0|)$ or if and only if $||G(t)- F(t)|| = o(||G(t)-G(t_0)||$ or if and only if $(F(t) - F(t_0)) \sim (G(t)-G(t_0))$, where asymptotic equvalence in general normed spaces is actually defined by using the small O notation. This then describes an equivalence relation on pairs like $(I,G)$. The $o$ notation can be used to define derivatives and differentiability in quite general spaces. What I like about this description is that it goes hand in hand with the mental image that I have of the function $F$ nesteling up agains its tangent line at the point $t_0$ and also that it captures the idea of being just a smidge faster than linearly, which seems to be a recurring concept in analysis (especially when it comes to summability and integrability, so why not emphasize it in differentiability).

0

The expression "Cauchy-Weierstrass definition of limit" is a misnomer since Cauchy never gave an epsilon-delta definition of limit. This issue was analyzed in detail in this article currently cited by 43 other studies.

In fact, the derivative can indeed be defined without epsilon-delta, by a direct method which formalizes what one naively thinks of as the "deletion of higher-order terms". This is explained in detail in Keisler's textbook Elementary Calculus.

0

The question is, given a graph $y = f(x)$, what line, $\ell(x) = M(x-x_0) + f(x_0)$, through the point $(x_0, f(x_0))$ "best fits" it. In terms of magnitudes, we want to find $M$ such that

$|f(x) - \ell(x)| = |f(x) - f(x_0) - M(x-x_0)|$ is appropriately "small" for all points $x$ that are close to $x_0$.

If we assume that $y=f(x)$ can be expressed as $f(x) = a_0 + (x-x_0)a_1 + (x-x_0)^2 a_2 + (x-x_0)^3 a_3 + \cdots$ then, if we can find a constant, $K$, such that

$|f(x) - f(x_0) - M(x-x_0)| \le K(x-x_0)^2$

it would be appropriate to define $f'(x_0) = M$. If we assume that $x \ne x_0$, then we can rewrite this as

$\left|\dfrac{f(x) - f(x_0)}{x-x_0} - M\right| \le K|x-x_0|$

for $x$ in some set of points that surrounds but doesn't include $x_0$.

Since we are only concerned about a neighborhood of $x_0$, we can assume that $K$ will exists as long as $(x-x_0)$ divides $\dfrac{f(x) - f(x_0)}{x-x_0} - M$

Examples

Let's take for example, $f(x)=x^2$.

$\dfrac{f(x) - f(x_0)}{x-x_0} = \dfrac{x^2 - x_0^2}{x-x_0}=x+x_0 = 2x_0 + (x-x_0)$

Hence

$\dfrac{f(x) - f(x_0)}{x-x_0} - 2x_0^2 = x-x_0$

and we conclude $f'(x_0) = 2x_0$.


Next, let's take $f(x)=x^3$. Then

\begin{align} \dfrac{f(x) - f(x_0)}{x-x_0} &= \dfrac{x^3 - x_0^3}{x-x_0} \\ &= x^2 + x x_0 + x_0^2 \\ &= 3x_0^2 + (x^2 + x x_0 - 2x_0^2) \\ &= 3x_0^2 + (x-x_0)(x + 2x_0) \\ \end{align}

and we conclude $f'(x_0) = 3x_0^2$. The obvious question is, "How did I get $3x_0^2$ out of $x^2 + x x_0 - 2x_0^2$ and, of course the answer is, evaluate $x^2 + x x_0 - 2x_0^2$ at $x=x_0$.


One more example would be $f(x) = \dfrac 1x$.

\begin{align} \dfrac{f(x) - f(x_0)}{x-x_0} &= \dfrac{\dfrac 1x - \dfrac{1}{x_0}}{x-x_0} \\ &= -\dfrac{x -x_0}{x x_0(x-x_0)} \\ &= -\dfrac{1}{x x_0} \\ &= -\dfrac{1}{x_0^2}+\dfrac{1}{x_0^2}-\dfrac{1}{x x_0} \\ &= -\dfrac{1}{x_0^2}+\dfrac{1}{x x_0^2}(x-x_0) \\ \\ \end{align}

and we conclude that $f'(x_0) = -\dfrac{1}{x_0^2}$.


I'm not sure that this is a more reasonable alternative, but it is a viable alternative. And it has been worked out considerably by others. I'm pretty sure that there was an AMA article about it, but I don't have the resources to look it up.