The most intuitive way I've found to understand it is to think of the Lagrangian formulation as a computationally more tractable way of applying the implicit function theorem. It requires more mental effort than just pulling a lagrangian out of thin air and chugging through proofs, but it gets to the same place and ultimately I think it provides a better motivation.
The basic idea is this: you eliminate the constraint by composing the cost functional with the solution operator to the constraint equation. Then you can take derivatives and find extrema of the composed system by using the implicit function theorem. The system of equations you have to solve based on a naieve application of the implicit function theorem is wasteful, and a smart simplification to the system yields the lagrange multipliers in a natural way.
Suppose you have
- a cost functional $F:X \rightarrow \mathbb{R}$,
- a parameterized constraint function $G:Q \times X \rightarrow X$, ($q \in Q$ is the parameter)
- a solution operator $S:Q \rightarrow X$ such that $G(q,S(q))=b$ for any given $q\in Q$,
and you want to find the sensitivity of $F$ with respect to the parameter $q$, going through the solution operator: $\frac{d}{dq} F(S(q)).$
In your case, the spaces and operators are
- $X=\mathbb{R}^2$, $Q=\mathbb{R}^1$,
- $F(x_1,x_2)=f(x_1,x_2)$,
- $G(q,(x_1,x_2))=(g(x_1,x_2), q-x_1)^T$
- $b=(c,0)^T$.
The implicit function theorem tells us how to compute this total derivative in terms of partial derivatives of things we know: $\frac{d}{dq} F(S(q)) \cdot \delta q = F^{'}_x(S(q)) \circ [G^{'}_x (q,S(q))]^{-1} \circ G^{'}_q (q,S(q)) \cdot \delta q.$
This is theoretically all you need to find the critical points for a smooth optimization problem, but there is a major issue: to use it you need to compute the matrix inverse $[G^{'}_x (q,S(q))]^{-1}$, which may be very difficult!
On the other hand, computing the full inverse is wasteful since $[G^{'}_x ]^{-1}:X \rightarrow X$ (n-by-n matrix), whereas the operator on it's left is $F^{'}_x : X \rightarrow \mathbb{R}$ (1-by-n vector). We can safely ignore the action of $[G^{'}_x ]^{-1}$ in any direction that is in the kernel of $F^{'}_x$, since vectors in that kernel get sent to zero anyways.
Thus is it natural to look for a Riesz-representor $\lambda$ for the combined operator $F_x \circ [G^{'}_x]^{-1}: X \rightarrow \mathbb{R}$. Ie, a vector $\lambda$ such that $F_x \circ [G^{'}_x]^{-1} \cdot x = (\lambda,x) ~~~ \forall x \in X$ Or turning the equation around, we want $\lambda$ that solves $G^{'}_x \lambda = F^{'}_x,$ the familiar lagrange multiplier equation. We have to solve a n-by-n system once for a single right hand side, rather than fully inverting it for any possible right hand side. Then evaluating the implicit derivatives becomes much simpler, with the familiar formula $\frac{d}{dq} F(S(q)) \cdot \delta q = (\lambda, G^{'}_q (q,S(q)) \cdot \delta q).$
Thus the "Lagrangian" can be seen as a convenient shorthand for the object whose gradient gives you the correct equations for computing implicit derivatives efficiently.
(As an aside, there is also a completely different motivation for the lagrangian from game theory - hopefully someone else will post more about it.)