37
$\begingroup$

A function $f : \mathbb{R} \to \mathbb{R}$ is convex (or "concave up") provided that for all $x,y \in \mathbb{R}$ and $t \in [0,1]$, $f(tx + (1-t)y) \le tf(x) + (1-t)f(y).$ Equivalently, a line segment between two points on the graph lies above the graph, the region above the graph is convex, etc. I want to know why the word "convex" goes with the inequality in this direction, and how I can remember it. Every reason I have heard makes just as much sense applied to the opposite inequality ("concave down").

  • 1
    I think sometimes as mathematicians we forget that some words we use do have normal "every-day" meanings. The definition of convex, in the every-day sense, means that a surface bulges out TOWARDS you as you look at it. And since the canonical orientations of the way we draw graphs have us think we are standing BELOW the graph, a convex function looks... well... convex.2017-03-10

11 Answers 11

32

Not sure why convex is defined that way, but one way to remember is that the derivative is monotonically increasing for some convex functions.

Or maybe just remember that $e^x$ is conv$e^x$. (I just thought of this one!)

13

Lets say that you accept the definition of a convex set in higher dimensions, like a sphere in $\mathbb{R}^3$. The question I seek to provide insight into is why convex functions in one variable are defined as opening up instead of down, since this seems like an arbitrary definition. This is because, depending on how you look at the graph, you could naively view the function as bending outwards (like a convex set) or inwards (concave). However, there is a nice connection between these two things using metric spaces that I think can provide some meaning to the way it is defined.

Most of the metrics that you are familiar with have open balls that are convex, such as the standard metric. But some are actually non convex. A good example of this is $ d(x,y) = \sum \sqrt{|x_i - y_i |} $. (note that $\sqrt{x}$ is not a convex function)

Here is an interesting condition:

Given a metric $d$. If for all $y,z\in E$ and $0\leq t\leq 1$,

$d \left(x, \ t y \; \, + \; (1-t) z \right) \quad \leq \quad t d(x,y) \; + \; (1-t) d(x,z) $

then the open balls formed by $d$ are convex. [1] In other words, if you fix $x$ and $d(y):\mathbb{R}^n\rightarrow \mathbb{R}$ is a convex function, then the open balls are convex sets.

Usually $d(x,y) = \sum f \, (x_i,y_i)$, for some $f:\mathbb{R}^2\rightarrow\mathbb{R}$. If we fix $x$ and $f:\mathbb{R}\rightarrow \mathbb{R}$ fits the definition of a convex function, then $d$ will also be convex, and the condition will be satisfied, giving us convex balls.

So convex functions (if they can form a metric) will give you convex open balls. A nice connection that makes the definition make more sense. Other conditions that guarantee convex open balls are discussed in the paper I reference.

[1] Norfolk, T. (1991). When does a metric generate convex balls? www.math.uakron.edu/~norfolk/convex.ps

  • 0
    @user116: No, it is not circular. This answer connects convexity of functions to convexity of shapes.2012-08-25
12

One of my professors told me the following memorable line: "A concave function looks like the roof of a cave." which helps me remember what is a concave and what is a convex function.

  • 1
    Since this question has gone CW I unaccepted the answer. I like it as a mnemonic, but I still haven't seen a really satisfactory answer for "why". Answers posted so far seem just as applicable if you reverse them.2010-09-01
8

The primary concept is convexity, not concavity. It applies to geometric figures, originally lenses, and this usage was adapted to functions. There is no comparable concept of concavity for, say, 2-dimensional regions, except as the absence of the property of convexity. There is also no property for figures in general corresponding to the anti-convexity inequality, because most non-convex figures will be locally convex. It is a matter of historical convention that a function is called "convex" if the region above the graph of the function is convex, and it would have caused no mathematical problem to use the opposite convention based on the region below, but concavity is a more limited concept that is defined in terms of convexity (or only defined for functions) and not the other way around.

The terms "concave up" and "concave down" appear mainly in non-specialist US college textbooks on calculus. They are nonstandard terminology and, I think, bad practice that should be discouraged (with luck and sufficient ruthlessness maybe they can be squelched in a generation...). As far as I know the etymology went as follows:

  1. Like "convex", the word "concave" has a prior use in optics. Concave (inward-curved) lenses are the opposite of convex lenses, so there is a pre-existing word for "not convex" or "convex in the opposite direction".

  2. Convex has an absolutely entrenched mathematical use to denote convex figures as well as functions (and sequences) with increasing derivative.

  3. Functions whose negative is convex occur frequently and "concave [function]" came into use as a convenient description of this situation. The linguistic logic was clear enough to make this immediately understandable. It's not clear whether it was more or less favored compared to statements involving the negative, such as saying that $-f(u)$ is convex, or $f$ is anti-convex, or that is it the negative of a convex function. I don't have data at hand from web searches or anything like that, but I think concavity is less common as a description of negatively convex sequences. For functions the ability to draw a graph makes the resemblance to lenses clearer so that both words seem sensible. (added: concavity as a counterpart to convexity for functions and sequences also gained momentum as its own term once log-convexity and log-convex became standard usage. Because the relationship between log-convex and log-concave functions is not simply change of sign but a multiplicative inverse, using only the words based on convexity might lead to confusion or circumlocution.)

  4. Authors of US college calculus textbooks, writing for an audience not familiar with or necessarily interested in convex figures and optics, and aware of potential for confusion (e.g., the graph of a concave function still bounds a convex-shaped region, or the subsequent use of convex to describe functions of several variables and the regions on which those are defined) cooked up a terminology based on "concavity" as a stand-alone concept, limited to the one-variable context where $f(x)$ is graphed with the $y$-axis direction being upward. It's not clear how consistent this concave-up and concave-down terminology is between books and whether it agrees with the earlier, non-confusing use of concave to denote negative convexity.

  • 1
    @T..: It looks like you misread the Wikipedia article. The relevant portion is: " a real-valued function f(x) defined on an interval is called convex (or convex downward or concave upward"2011-09-29
7

With the caveat that it's usually more helpful to devise your own mnemonics than follow someone else's

  • here are a couple of mine, poorly drawn (the second is same as Srikant Vadali's answer):

Convex function Concave function

  • convex: smiley face

  • Another way of remembering them, if you recall the meanings of convex and concave outside mathematics (as in lenses, etc.), is that you look from below: if the graph of the function viewed from below looks convex (i.e., bulging towards you) the function is convex, if it looks concave the function is concave.

  • Yet another way is to keep in mind the definition: "a function is convex if its epigraph is a convex set". The epigraph is the set of points lying above the graph, and a convex set is one in which every line segment between two points in the set lies within the set. [Actually, for me, this definition is more useful for remembering what epigraph means :-)]

  • 0
    For some reason, the epigraph being convex is the one that made me remember this forever. Maybe there is some hidden psychological reason?2015-12-19
3

It is always good to go back to the source. Modern treatment of convex functions can be traced back to a paper by J.L.V.W. Jessen, “Om konvekse Funktioner og Uligheder mellem Middelværdier,” Nyt Tidsskrift for Mathematik 01/1905; 16 B. It can be found in Google books. The definition simply says:

$\phi(x) + \phi(y) \geqslant 2 \phi(\frac{x+y}{2})$

A convex function does not require differentiability, nor continuity. It can be defined in any metric space with geodesics, where a "middle point" is well defined. The essential idea of a convex function is therefore its property of bulging out towards the "outside." (Average of the function is larger than the function of the average.) If we have to use a more graphical name, nowadays we would probably call it a centrifugal function instead, since it bulges out from the center of any interval.

Now, the shape of a convex function is clearly counter-intuitive. If we view the bottom of its curve as the base, as the road sign image from Wikipedia shows: road bump, then clearly a convex surface would correspond to what we would call a "concave" function, unless we assume the opposite and take the top of the curve as the base, which makes it all the less intuitive. The same awkwardness is observed in an ideographic language such as Chinese, where the character for convexity is 凸 (Pinyin , a bump) and concavity is 凹 (Pinyin āo, a dip).

So how can we intuitively remember and visualize the shape of a convex function? How can we reconcile the fact that this function is bulging out, with its actual shape? It's actually rather simple. Think of a laundry machine with a vertical cylinder. When the cylinder spins as in the drying cycle, the water surface will rise towards the edge of the cylinder due to the centrifugal force, and the water will be pushed out towards the edge. So the right interpretation of bulging out is not in the vertical direction, but in the horizontal direction! We simply interpret the function's value as the pressure or centrifugal force.

For those interested in seeing the shape of the water surface in a spinning tank, YouTube has several video clips. Here is one of the them: Centrifugal Force on Rotating Water Container. After seeing this video, and understanding the horizontal bulging out, you will probably never forget the shape of a convex function.

2

A line is said to "support" the graph of a function (or indeed, any subset of the Cartesian plane) if it "holds up" the graph: that is, the graph lies entirely above or on the line. (After all, gravity pulls downward!) We might think of the union of all support lines as the "ground" on which the graph lies; everything else--its set-theoretic complement--is the "sky".

A function of the real numbers is convex if and only if its graph is the boundary between the ground and the sky. This is a special case of the more general idea of convexity that applies to arbitrary planar regions, the same as the familiar distinction between a convex and non-convex polygon, for example. For arbitrary regions there is no definite "up" and "down" anymore, though, so we say that a line supports a region when the region lies entirely within one of the two closed half-planes bounded by that line. (Thus, the interior and boundary of a convex polygon form its "sky" and everything outside is the "ground.")

In short, calling a "concave upward" function "convex" unites two closely related familiar concepts and is justified by the universal earthbound human experience that gravity usually pulls downward.

  • 0
    @Wangyan So is the question.2016-03-23
1

Instead of thinking about the graph of $f: \mathbb{R} \to \mathbb{R}$ as a 2-D object in the plane, think about $f$ mapping one number line onto another.

drawing of a convex function from ℝ to ℝ

I'll return to that picture, just notice that there is plenty of "room" so to speak and that the arrows mapping point to image will never "fall back" on each other, overlap, or "crowd in" so long as the function is convex.

Imagine a closed loop in the plane whose interior is non-convex. It's like a deflated balloon. You need to "blow it up" until it's at least modestly ($\leq$) full of air for the interior to be convex. Similarly if you had a non-convex polygon and "blew air" inside of it, you would get a convex shape. So it's like convex shapes have to be sufficiently "inflated".

Similarly, the arrows in an $\mathbb{R} \to \mathbb{R}$-type picture like the above would be "flopping" inward onto each other, deflated if you will. In the convex mapping the arrows don't overlap at all -- they've got "pressure" or "energy" pushing them outward enough so that they don't overlap. So the image is properly inflated, if you will.

So @NateEldridge, the epigraph being a convex set is a red herring. Think about just the right-most point of a graph as it's being generated by a s-l-o-w graphing calculator. The image has to "outrun" the domain it comes from by $\geq$ each $dt$. And there you have your $f(\mathrm{interior\ of\ domain}) \leq \mathrm{image}_1 + \mathrm{image}_2$.

This is meant as an elaboration on @whuber's answer.

  • 0
    Maybe a better way to say the above would have been to use the words "nonnegative curvature" $\leftrightarrow$ convex set, and "negative curvature" $\leftrightarrow$ non-convex.2011-03-18
0

This is what I tell my students. We know what a convex set is, and we need a name for functions satisfying the condition above. By (verbal) analogy we call them convex, too. But in that case the curve is the bottom (down) part of a convex region, so we can say that convex means convex down. But then concave up should equal convex down, i.e., the curve is the top (up) part of a concave region. Concave down then equals convex up, meaning that the curve is lower part of a concave epigraph or the upper part of a convex hypograph. Hope this helps!

0

Consider any plane simple differentiable loop. We say that this is convex if one can draw a straight segment connecting any two points, without leaving the "inside" of the loop. Convexity is just the name of this property, a way -if you like- to spend less time conveying the meaning.

Now, by Dini's theorem the support (or graph, the actual line in the plane) of your loop) is locally the graph of a function. Of course this can be either x = x(y) and y = y(x) so that one might have to be careful in rotating and reflecting the drawing in the former.

For simplicity's sake we will restric to the latter, y = y(x). Any convex loop you can draw will be, in these neighborhoods, 'smiling'. This is because if it started frowning, we could easily draw a line that 'breaks through' our loop.

The 'reason' why the upper side of the graph is chosen to have this property is basically a visual one: take the same convex loop as before and notice that the upper side of our loop always corresponds to the 'inside' of the loop, whether in neighborhoods of the form y = y(x) or x = x(y).

Remkark: this is obviously very simplistic and the definitions I give are not really canonical, but I thought it was a pretty argument from a very informal point of view.

0

One can think up reasons for "convex" to refer to the region above the graph, but all seem ad hoc, and "tweakable" so they refer to the region below it. We need to find the source, and her/his reasons. The most frequently used property of convex functions that I know of is Jensen's inequality. This was 1906. Presumably the source is several years before that, but I haven't found it.