I've always used the method of Lagrange multipliers with blind confidence that it will give the correct results when optimizing problems with constraints. But I would like to know if anyone can provide or recommend a derivation of the method at physics undergraduate level that can highlight its limitations, if any.
Derivation of the method of Lagrange multipliers?
-
0@John, you may or may not find the answers to this similar question helpful: http://math.stackexchange.com/q/674/400 – 2011-02-27
2 Answers
Lagrange multipliers are used to obtain the maximum of a function $f(\mathbf{x})$ on a surface $\{ \mathbf{x}\in\mathbb{R}^n\mid g(\mathbf{x}) = 0\}$ (I use "surface", but whether it is a 2-dimensional, 1-dimensional, or whatever-dimensional object will depend on the $g$ and the $\mathbb{R}^n$ we are dealing with).
The gradient of $f$, $\nabla f$, points in the direction of greatest increase for $f$. If we want to find the largest value of $f$ along $g$, then we need the direction of greatest increase to be orthogonal to $g$; otherwise, moving along $g$ will "capture" some of that increase and $f$ will not achieve its maximum among $g$ at that point (this is akin to the fact that in one-variable calculus, the derivative should be $0$ at the maximum, otherwise moving a bit will increase in one direction will increase the value of the function).
In order for $\nabla f$ to be perpendicular to the surface, it must be parallel to the gradient of $g$; so $\nabla f$ must be a scalar multiple of $\nabla g$. So this amounts to finding a solution to the system \begin{align*} \nabla f(\mathbf{x}) &= \lambda \nabla g(\mathbf{x})\\ g(\mathbf{x}) &= 0 \end{align*} for both $\mathbf{x}$ and $\lambda$.
Added. Such a point is not guaranteed to be a maximum or a minimum; it could also be a saddle point, or nothing at all, much as in the one-variable case, points where f'(x)=0 are not guaranteed to be extremes of the function. Another obvious limitation is that if the surface $g$ is not differentiable (does not have a well-defined gradient) then you cannot even set up the system.
-
0Can you please elaborate on what you mean by "The gradient of $f$, $\nabla f$, points in the direction of greatest increase for $f$."? Do you mean that $\nabla f$ points in the direction of the greatest increase for $f$ at the point where $f$ is maximal? – 2015-12-02
An algebraic way of looking at this is as follows:
From an algebraic view point, we know how to find the extremum of a function of many variables. Say we want to find the extremum of $f(x_1,x_2,\ldots,x_n)$, we set the gradient to zero and look at the definiteness of the Hessian.
We would like to extend this idea, when we want to find the extremum of a function along with some constraints. Say the problem is: $\begin{align} \text{Minimize }f(x_1,x_2,\ldots,x_n)\\\ \text{subject to: }g_k(x_1,x_2,\ldots,x_n) = 0\\\ \text{where }k \in \{1,2,\ldots,m\}\\\ \end{align} $
If we find the extremum of $f$ just by setting the gradient of $f$ to zero, these extremum need not satisfy the constraints.
Hence, we would like to include the constraints in the previous idea. One way to it is as follows. Define a new function: $F(\vec{x},\vec{\lambda}) = f(\vec{x}) - \lambda_1 g_1(\vec{x}) - \lambda_2 g_2(\vec{x}) - \cdots - \lambda_m g_m(\vec{x})$ where $\vec{x} = \left[ x_1,x_2,\ldots,x_n \right], \vec{\lambda} = \left[\lambda_1,\lambda_2,\ldots,\lambda_m \right]$
Note that when the constraints are enforced, we have $F(\vec{x},\vec{\lambda}) = f(\vec{x})$ since $g_j(x) = 0$ when the constraints are enforced.
Let us find the extremum of $F(\vec{x},\vec{\lambda})$. This is done by setting $\frac{\partial F}{\partial x_i} = 0$ and $\frac{\partial F}{\partial \lambda_j} = 0$ where $i \in \{1,2,\ldots,n\}$ and $j \in \{1,2,\ldots,m\}$
Setting $\frac{\partial F}{\partial x_i} = 0$ gives us $\vec{\nabla}f = \vec{\nabla}g \cdot \vec{\lambda}$ where $\vec{\nabla}g = \left[\vec{\nabla} g_1(\vec{x}),\vec{\nabla} g_2(\vec{x}),\ldots,\vec{\nabla} g_m(\vec{x}) \right]$
Setting $\frac{\partial F}{\partial \lambda_j} = 0$ gives us $g_j(x) = 0$ where $j \in \{1,2,\ldots,m\}$
Hence, we find that when we find the extremum of $F$, the constraints are automatically enforced. This means that the extremum of $F$ corresponds to extremum of $f$ with the constraints enforced.
To decide, if the extremum is a minimum (or) maximum (or) if the point we obtain by solving the system is a saddle point, we need to look at the definiteness of the Hessian of $F$ and decide.
-
0+1. The approach Sivaram describes here also leads to a notion of duality for nonlinear optimization problems and ultimately to the important Karush-Kuhn-Tucker conditions. – 2011-02-27