Introduction
In this part we will cover the concept of Lagrangian duality, which is a powerful tool in optimization theory. We will introduce, discuss and prove several important results, including weak and strong duality.
We will then connect this to global optimality conditions, and show how duality can be used to derive optimality conditions for (un)constrained optimization problems. Finally, we will also (very shortly) introduce Linear Programming (LP) and prove Farkas’ lemma using duality.
The Lagrangian: Setup
Recall our setup for the Lagrangian,
Definition: Lagrangian Function
For the optimization problem 1 $$ \begin{align*} (P) \quad & \begin{cases} \inf \ & f(\mathbf{x}) \newline \text{subject to} \ & g_i(\mathbf{x}) \leq 0, \ i = 1, \ldots, m \newline & \mathbf{x} \in \mathbf{X} \end{cases} \end{align*} $$ The Lagrangian function is defined as, $$ \mathcal{L}(\mathbf{x}, \boldsymbol{\mu}) = f(\mathbf{x}) + \sum_{i=1}^m \mu_i g_i(\mathbf{x}), $$ for $\boldsymbol{\mu} \geq \mathbf{0}$.
Further, $$ \begin{align*} \inf \ & f(\mathbf{x}) \newline \text{subject to} \ & x \in \mathbf{X} \end{align*} $$ is a relaxation of the (primal) problem $(P)$ 2.
Note
Not all constraints in $(P)$ are relaxed, we still have $\mathbf{x} \in \mathbf{X}$.
Further, in practice, there is no general recipe for which constraints to relax and which to keep, it depends on the problem at hand.
With our setup we can now define the Lagrangian dual function.
The Lagrangian: Dual Function
Definition: Lagrangian Dual Function
The (Lagrangian) dual function is defined as, $$ \begin{align*} q(\boldsymbol{\mu}) \coloneqq \ & \inf \ \mathcal{L}(\mathbf{x}, \boldsymbol{\mu}) \newline & \text{subject to} \ \mathbf{x} \in \mathbf{X} \end{align*} $$
Theorem: Weak Duality
For any $\boldsymbol{\mu} \geq \mathbf{0}$ and any feasible $\mathbf{x}$ to $(P)$, we have, $$ q(\boldsymbol{\mu}) \leq f(\mathbf{x}). $$
Corollary: Weak Duality
Let $f^{\star}$ be the optimal value of $(P)$. Then, $$ q(\boldsymbol{\mu}) \leq f^{\star}, \quad \forall \boldsymbol{\mu} \geq \mathbf{0}. $$
Proof: Weak Duality
$$ \begin{align*} q(\boldsymbol{\mu}) \coloneqq \ & \inf \ f(\mathbf{z}) + \sum_{i=1}^m \mu_i g_i(\mathbf{z}) \newline & \text{subject to} \ \mathbf{z} \in \mathbf{X} \newline & \leq f(\mathbf{x}) + \underbrace{\sum_{i=1}^m \underbrace{\mu_i}_{\geq 0} \underbrace{g_i(\mathbf{x})}_{\leq 0}}_{\leq 0} \quad \text{(for any feasible } \mathbf{x} \text{ to } (P) \text{)} \newline & \leq f(\mathbf{x}) _\blacksquare. \end{align*} $$
The Lagrangian: Dual Problem
Definition: Lagrangian Dual Problem
One can understand that the dual function $q(\boldsymbol{\mu})$ gives a lower bound on the optimal value of $(P)$ for any $\boldsymbol{\mu} \geq \mathbf{0}$. Thus, we can try to find the “best” lower bound by solving the dual problem, i.e., maximizing $q(\boldsymbol{\mu})$ over $\boldsymbol{\mu} \geq \mathbf{0}$,
The (Lagrangian) dual problem is defined as, $$ \begin{align*} (D) \quad & \begin{cases} \sup \ & q(\boldsymbol{\mu}) \newline \text{subject to} \ & \boldsymbol{\mu} \geq \mathbf{0} \end{cases} \end{align*} $$
Note
From weak duality, if $q^{\star}$ is the optimal value of $(D)$, then $q^{\star} \leq f^{\star}$, i.e., the optimal value of the dual problem is a lower bound on the optimal value of the primal problem. This is also what we saw in the last part with the relaxtion theorem.
Theorem: Convexity of the Dual Function
The dual function $q$ is concave and its effective domain, defined as, $$ \{\boldsymbol{\mu} \in \mathbb{R}^n \mid q(\boldsymbol{\mu}) > -\infty\}, $$ is a convex set. Thus $(D)$ is a convex optimization problem, even if $(P)$ is not.
Note
In the dual problem, we are maximizing a concave function on a convex set, which is equivalent to minimizing a convex function on a convex set, thus $(D)$ is a convex optimization problem.
Definition: Lagrangian Multiplier
$\boldsymbol{\mu}^{\star}$ is called a Lagrange multiplier if, $$ \boldsymbol{\mu}^{\star} \geq \mathbf{0} \text{ and } f^{\star} = \begin{align*} & \inf \ \mathcal{L}(\mathbf{x}, \boldsymbol{\mu}^{\star}) \newline & \text{subject to} \ \mathbf{x} \in \mathbf{X} \end{align*} $$
Note
We can see that, if $\boldsymbol{\mu}^{\star}$ is a Lagrange multiplier, then $$ q(\boldsymbol{\mu}^{\star}) = f^{\star} $$ This means that,
- $\boldsymbol{\mu}^{\star}$ solves the dual problem $(D)$, and
- only exists if $q^{\star} = f^{\star}$.
Danger
In other literature, any $\boldsymbol{\mu}^{\star} \geq \mathbf{0}$ is called a Lagrange multiplier, and the above is called an optimal Lagrange multiplier. Not just the optimal ones.
Global Optimality Conditions
We will now connect duality to global optimality conditions.
Theorem: Global Optimality Conditions
Consider the pair of primal-dual vectors $(\mathbf{x}^{\star}, \boldsymbol{\mu}^{\star})$, $\mathbf{x}^{\star}$ is optimal to the primal problem $(P)$ and $\boldsymbol{\mu}^{\star}$ is optimal to the dual problem $(D)$, if and only if,
- $$\mathbf{x}^{\star} \in \begin{align*} & \arg \inf \ \mathcal{L}(\mathbf{x}, \boldsymbol{\mu}^{\star}) \newline & \text{subject to} \ \mathbf{x} \in \mathbf{X} \end{align*} \ , $$
- $$\mathbf{x}^{\star} \in \mathbf{X}, \ g_i(\mathbf{x}^{\star}) \leq 0, \ i = 1, \ldots, m,$$
- $$\boldsymbol{\mu}^{\star} \geq \mathbf{0},$$
- $$\mu_i^{\star} g_i(\mathbf{x}^{\star}) = 0, \ i = 1, \ldots, m.$$
Proof: Global Optimality Conditions
Firstly, let $(\mathbf{x}^{\star}, \boldsymbol{\mu}^{\star})$ satisfy the four conditions. The first and third condition means $\mathbf{x}^{\star}$ is a optimal solution to a relaxation.
The second condition means $\mathbf{x}^{\star}$ is feasible to $(P)$. Fourth condition ensures that $f(\mathbf{x}^{\star}) = f_R(\mathbf{x}^{\star}) = \mathcal{L}(\mathbf{x}^{\star}, \boldsymbol{\mu}^{\star})$.
By the relaxtion theorem, $\mathbf{x}^{\star}$ is optimal to $(P)$. Therefore by definition of Lagrange multiplier, $\boldsymbol{\mu}^{\star}$ is optimal to $(D)$.
Conversely, now let $\mathbf{x}^{\star}$ be optimal to $(P)$ and $\boldsymbol{\mu}^{\star}$ be a Lagrange multiplier (i.e., optimal to $(D)$).
Then, the second and third conditions hold by definition.
Further, by optimality of $\mathbf{x}^{\star}$ to $(P)$ and definition of Lagrange multiplier, we have, $$ f(\mathbf{x}) = f^{\star} = \begin{align*} & \inf \ \mathcal{L}(\mathbf{x}, \boldsymbol{\mu}^{\star}) \newline & \text{subject to} \ \mathbf{x} \in \mathbf{X} \end{align*} \ , $$ Since $\mathbf{x}^{\star}$ is feasible to $(P)$ and $\boldsymbol{\mu}^{\star} \geq \mathbf{0}$, we have, $$ \begin{align*} \mathcal{L}(\mathbf{x}^{\star}, \boldsymbol{\mu}^{\star}) & = f(\mathbf{x}^{\star}) + \underbrace{\sum_{i=1}^m \underbrace{\mu_i^{\star}}_{\geq 0} \underbrace{g_i(\mathbf{x}^{\star})}_{\leq 0}}_{\leq 0} \newline & \leq \underbrace{f(\mathbf{x}^{\star})}_{= f^{\star}} \newline & = \begin{align*} & \inf \ \mathcal{L}(\mathbf{x}, \boldsymbol{\mu}^{\star}) \newline & \text{subject to} \ \mathbf{x} \in \mathbf{X} \end{align*} \newline & \implies \mathbf{x}^{\star} \in \begin{align*} & \arg \inf \ \mathcal{L}(\mathbf{x}, \boldsymbol{\mu}^{\star}) \newline & \text{subject to} \ \mathbf{x} \in \mathbf{X} \end{align*} \newline \end{align*} $$ Finally, since, $$ \begin{align*} f(\mathbf{x}^{\star}) & = \mathcal{L}(\mathbf{x}^{\star}, \boldsymbol{\mu}^{\star}) \newline & = f(\mathbf{x}^{\star}) + \sum_{i=1}^m \mu_i^{\star} g_i(\mathbf{x}^{\star}) \newline & \implies \sum_{i=1}^m \mu_i^{\star} g_i(\mathbf{x}^{\star}) = 0 \newline & \implies \mu_i^{\star} g_i(\mathbf{x}^{\star}) = 0, \ i = 1, \ldots, m _\blacksquare. \end{align*} $$
Note
If $\mathbf{X} = \mathbb{R}^n$, point one of the theorem implies that, $$ \begin{align*} \nabla_{\mathbf{x}} \mathcal{L}(\mathbf{x}^{\star}, \boldsymbol{\mu}^{\star}) & = \mathbf{0} \newline \nabla f(\mathbf{x}^{\star}) + \sum_{i=1}^m \mu_i^{\star} \nabla g_i(\mathbf{x}^{\star}) & = \mathbf{0} \end{align*} $$
Example: Global Optimality Conditions and KKT Points
Consider the problem, $$ \begin{align*} \text{minimize} \ & x_1^2 + \frac{1}{100} x_2^2 \newline \text{subject to} \ & -x_1^2 - x_2^2 + 1 \leq 0 \end{align*} $$ Find points fulfilling the global optimality conditions and KKT points.
Solution
How do we find points fulfilling the global optimality conditions? Firstly, we note that, it is only possible if $q^{\star} = f^{\star}$, in which case $\boldsymbol{\mu}^{\star}$ solves the dual problem. One strategy is to,
- Define the Lagrangian,
- Derive the dual function,
- Solve the dual problem to get $\boldsymbol{\mu}^{\star}$,
- Use it to find $\mathbf{x}^{\star}$ fulfilling the global optimality conditions.
We form the Lagrangian, $$ \mathcal{L}(\mathbf{x}, \mu) = x_1^2 + \frac{1}{100} x_2^2 + \mu(-x_1^2 - x_2^2 + 1) = (1 - \mu)x_1^2 + \left(\frac{1}{100} - \mu\right)x_2^2 + \mu, $$ Next, we derive the dual function, $$ \begin{align*} q(\mu) & = \inf \ \mathcal{L}(\mathbf{x}, \mu) \newline & = \inf \ (1 - \mu)x_1^2 + \left(\frac{1}{100} - \mu\right)x_2^2 + \mu \newline & = \begin{cases} -\infty, & \text{if } \mu > \frac{1}{100} \newline \mu, & \text{if } \mu \leq \frac{1}{100} \end{cases} \end{align*} $$ Then, we solve the dual problem, $$ \begin{align*} \text{maximize} \ & q(\mu) \newline \text{subject to} \ & \mu \geq 0 \end{align*} $$ which is equivalent to, $$ \begin{align*} \text{maximize} \ & \mu \newline \text{subject to} \ & \mu \geq 0 \newline & \mu \leq \frac{1}{100} \end{align*} $$ Which is trivial to solve, giving $\mu^{\star} = \frac{1}{100}$ and $q^{\star} = \frac{1}{100}$. Finally, we use $\mu^{\star}$ to find $\mathbf{x}^{\star}$ fulfilling the global optimality conditions. We have, $$ \begin{align*} \mathbf{x}^{\star} & \in \underset{\mathbf{x} \in \mathbb{R}^2}{\arg \inf} \ \mathcal{L}(\mathbf{x}, \mu^{\star}) \newline g(\mathbf{x}^{\star}) & \leq 0 \newline \mu^{\star} & \geq 0 \newline \mu^{\star} g(\mathbf{x}^{\star}) & = 0 \end{align*} $$ By the first condition, we have, $$ \begin{align*} \mathbf{x}^{\star} & \in \underset{\mathbf{x} \in \mathbb{R}^2}{\arg \inf} \ \left(1 - \frac{1}{100}\right)x_1^2 + \left(\frac{1}{100} - \frac{1}{100}\right)x_2^2 + \frac{1}{100} \newline & = \underset{\mathbf{x} \in \mathbb{R}^2}{\arg \inf} \ \frac{99}{100} x_1^2 + \frac{1}{100} \newline \end{align*} $$ So $\mathbf{x}^{\star} \in \{\mathbf{x} \in \mathbb{R}^2 \mid x_1 = 0\}$. However, by the fourth condition, we have, $$ -(x_1^{\star})^2 - (x_2^{\star})^2 + 1 = 0 \implies x_2^{\star} = \pm 1 $$ Thus, the points fulfilling the global optimality conditions are $\mathbf{x}^{\star} = \begin{bmatrix} 0 & \pm 1 \end{bmatrix}^T$ and $\mu^{\star} = \frac{1}{100}$ fulfill the global optimality condition fulfill the global optimality conditions.
However, the KKT points $\begin{bmatrix} \pm 1 & 0 \end{bmatrix}^T$ do not fulfill the global optimality conditions. In these points we will find $\mu = 1$, and thus, $$ \begin{align*} \underset{\mathbf{x} \in \mathbb{R}^2}{\arg \inf} \ & (1 - 1)x_1^2 + \left(\frac{1}{100} - 1\right)x_2^2 + 1 \newline & = \underset{\mathbf{x} \in \mathbb{R}^2}{\arg \inf} \ -\frac{99}{100} x_2^2 + 1 \newline & = \emptyset \end{align*} $$
Theorem: Strong Duality
Consider the primal problem $(P)$ and let $f^{\star}$ and $q^{\star}$ be the optimal values of the primal problem and the dual problem, respectively.
Further, let $f^{\star} > -\infty$ and let $f, g_i, \ i = 1, \ldots, m$ be convex functions and $\mathbf{X}$ be a convex set.
Finally, assume that, $\exists \ \mathbf{x}_0 \in \mathbf{X}$ such that, $g_i(\mathbf{x}_0) < 0, \ i = 1, \ldots, m$ (Slater’s condition). Then, $f^{\star} = q^{\star}$ and there exists a Lagrange multiplier $\boldsymbol{\mu}^{\star}$.
Moreover, if $\mathbf{x}^{\star}$ is optimal to $(P)$, then $(\mathbf{x}^{\star}, \boldsymbol{\mu}^{\star})$ satisfies the global optimality conditions.
Linear Programming and Farkas’ Lemma
We will now (very shortly) introduce Linear Programming (LP) and prove Farkas’ lemma using duality.
Definition: Linear Programming
A linear progrm (LP) on standard form is defined as, $$ \begin{align*} (P) \quad & \begin{cases} \inf \ & \mathbf{c}^T \mathbf{x} \newline \text{subject to} \ & A \mathbf{x} = \mathbf{b} \newline & \mathbf{x} \geq \mathbf{0} \end{cases} \end{align*} $$ Thus, the Lagrangian for a LP is, $$ \mathcal{L}(\mathbf{x}, \mathbf{y}) = \mathbf{c}^T \mathbf{x} + \mathbf{y}^T (b - A \mathbf{x}). $$
With this, we can now prove Farkas’ lemma in a very elegant way using duality.
Theorem: Farkas’ Lemma
Let $A \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$. Then, exactly one of the following systems is solvable, $$ \begin{cases} A \mathbf{x} = \mathbf{b} \newline \mathbf{x} \geq 0 \end{cases} \quad \quad \begin{cases} A^T \mathbf{y} \leq 0 \newline \mathbf{b}^T \mathbf{y} > 0 \end{cases} $$
Proof: Farkas’ Lemma
If the first system has a solution $\mathbf{x}$, then, $$ \mathbf{b}^T \mathbf{y} = \mathbf{x}^T A^T \mathbf{y} \leq 0, $$ But $\mathbf{x} \geq 0$ so $A^T \mathbf{y} \leq 0$ cannot hold, which means that the second system is infeasible.
Assume that the second system is infeasible. Consider the LP, $$ \begin{align*} \min \ & \mathbf{0}^T \mathbf{x} \newline \text{subject to} \ & A \mathbf{x} = \mathbf{b} \newline \ & \mathbf{x} \geq 0 \end{align*} $$ and its dual program, $$ \begin{align*} \max \ & \mathbf{b}^T \mathbf{y} \newline \text{subject to} \ & A^T \mathbf{y} \leq 0 \newline \ & \mathbf{y} \text{ free} \end{align*} $$ Since the second system is infeasible, $\mathbf{y} = \mathbf{0}$ is an optimal solution to the primal problem with optimal value $0$. Hence the Strong Duality theorem implies that there exists an optimal solution $\mathbf{x}^{\star}$ to the dual problem with optimal value $0$. This solution is feasible to the first system.
What we have proved above is the equivalence, $$ (I) \iff \neg (II) $$ Logically, this is equivalent to the statement that, $$ \neg (I) \iff (II) $$ We have hence established that precisely one of the systems is solvable. $_\blacksquare$