Introduction
In this part we will cover constrained optimization problems and penalty methods to solve them. Further, we will also see some examples of algorithms that use penalty methods to solve constrained optimization problems.
Penalty Functions
Definition: (Naïve) Penalty Function
Consider the optimization problem, $$ \begin{align*} (P^E) \quad & \begin{cases} \min \ & f(\mathbf{x}) \newline \text{subject to} \ & \mathbf{x} \in S \newline \end{cases} \end{align*} $$ where $S \subset \mathbb{R}^n$ is non-empty, closed, and $f : \mathbb{R}^n \mapsto \mathbb{R}$ is differentiable.
We introduce a penalty function to the problem to convert it to an unconstrained optimization problem, $$ \begin{align*} \min \ f(\mathbf{x}) + \chi_S(\mathbf{x}), \end{align*} $$ where, $$ \chi_S(\mathbf{x}) = \begin{cases} 0, & \mathbf{x} \in S \newline +\infty, & \mathbf{x} \notin S \newline \end{cases} $$ is the indicator function of the set $S$.
Note
Since feasibility is top priority, only when achieving feasibility should we focus on minimizing the objective function.
However, this is computationally bad, since the indicator function is non-differentiable, discontinuous, and not finite (although it is convex if $S$ is convex).
Thus, a more logical approach is to use somekind of numerical “warning” before becoming infeasible or near-infeasible, by approximating the indicator function with a numerically better behaving function.
Exterior Penalty Functions
Definition: (SUMT) Sequential Unconstrained Minimization Techniques and Exterior Penalty Functions
Suppose, $$ \begin{align*} S & = \{\mathbf{x} \in \mathbb{R}^n \mid g_i(\mathbf{x}) \leq 0, \ i = 1, \ldots, m \newline & \quad \quad h_j(\mathbf{x}) = 0, \ j = 1, \ldots, l \} \newline \end{align*} $$ Choose $C^0$ penalty function $\psi : \mathbb{R}^n \mapsto \mathbb{R}_+$ such that, $$ \psi(s) = 0 \iff s = 0 $$ typical choices are $\psi(s) = s^2$ or $\psi(s) = |s|$.
Thus, we can define an approximate indicator function as, $$ \chi_S(\mathbf{x}) \approx \nu \widehat{\chi}_S(\mathbf{x}) \coloneqq \nu \left( \sum_{i=1}^m \psi(\max\{0, g_i(\mathbf{x})\}) + \sum_{j=1}^l \psi(h_j(\mathbf{x})) \right) $$ $\nu \widehat{\chi}_S(\mathbf{x})$ approximates $\chi_S(\mathbf{x})$ from below, i.e., $\nu \widehat{\chi}_S(\mathbf{x}) \leq \chi_S(\mathbf{x})$ for all $\mathbf{x} \in \mathbb{R}^n$.
Further, $\nu > 0$ is a penalty parameter that controls the weight of the penalty term.
Finally, the unconstrained problem, $$ \begin{align*} (P^E_\nu) \quad & \min \ & f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x}) \newline \end{align*} $$ is called the penalty transformed problem.
Note
Firstly, we note that $(P^E_\nu)$ is a relaxation of $(P^E)$. Consider the increasing sequence $\{\nu_k\}$ with $\lim_{k \to \infty} \nu_k = +\infty$.
The corresponding sequence of approximate problems, $$ \begin{align*} \underset{\mathbf{x} \in \mathbb{R}^n}{\min} \ f(\mathbf{x}) + \nu_k \widehat{\chi}_S(\mathbf{x}) \end{align*} $$ with optimal solutions $\mathbf{x}^{\star}_{\nu_k}$, is called a Sequential Unconstrained Minimization Technique (SUMT).
If $\{\mathbf{x}^{\star}_{\nu_k}\}$ has a limit point $\hat{\mathbf{x}}$, then $\hat{\mathbf{x}}$ is an optimal solution to the original constrained problem.
Let $\mathbf{x}^{\star}$ be an optimal solution to, $$ \underset{\mathbf{x} \in S}{\min} \ f(\mathbf{x}) $$ For any $\nu > 0$, let $\mathbf{x}^{\star}_{\nu}$ be an optimal solution to, $$ \underset{\mathbf{x} \in \mathbb{R}^n}{\min} \ f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x}) $$ Thus, the lower bound on $f(\mathbf{x}^{\star})$ is given by, $$ f(\mathbf{x}^{\star}_{\nu}) + \nu \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu}) \leq f(\mathbf{x}^{\star}) + \nu \underbrace{\widehat{\chi}_S(\mathbf{x}^{\star})}_{=0} = f(\mathbf{x}^{\star}), \ \forall \nu > 0 $$ Further, since our original unconstrained problem and the penalty function are convex (and non-decreasing), it means that our constrained problem is also convex.
Theorem: Optimality of Limit Points of SUMT
Assume global optimal solution exists in original problem. $$ \underset{\mathbf{x} \in S}{\min} \ f(\mathbf{x}) $$ For any $\nu > 0$, let $\mathbf{x}^{\star}_{\nu}$ be an optimal solution to, $$ \underset{\mathbf{x} \in \mathbb{R}^n}{\min} \ f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x}) $$ Then, $\bar{\mathbf{x}}$, the limit point of $\{\mathbf{x}^{\star}_{\nu_k}\}$, as $\nu \to \infty$, is an optimal solution to the original problem.
Proof
To prove this theorem, we will need the following lemma.
Lemma
With the same assumptions as in the above theorem, for all $\nu_1, \nu_2 > 0$, such that $\nu_1 \leq \nu_2$, we have that, $$ f(\mathbf{x}^{\star}_{\nu_1}) \leq f(\mathbf{x}^{\star}_{\nu_2}) $$
For all $\nu > 0$, $(P^E_\nu)$ is a relaxation of $(P^E)$. Thus, from the relaxation theorem, $$ f(\mathbf{x}^{\star}_{\nu_k}) + \nu_k \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu_k}) \leq f(\mathbf{x}^{\star}) $$ Therefore, $$ 0 \leq \nu_k \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu_k}) \leq f(\mathbf{x}^{\star}) - f(\mathbf{x}^{\star}_{\nu_k}) $$ By the lemma and the lower bound, $$ 0 \leq f(\mathbf{x}^{\star}) - f(\mathbf{x}^{\star}_{\nu_1}) $$ Thus, if $\lim_{k \to \infty} \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu_k}) = 0$, since the term is squeezed between two finite values.
By continuity of $\widehat{\chi}_S$, every limit point of $\{\mathbf{x}^{\star}_{\nu_k}\}$ is in $S$.
Further, let $\bar{\mathbf{x}}$ be a limit point of $\{\mathbf{x}^{\star}_{\nu_k}\}$, i.e., there exists a subsequence such that, $$ \lim_{k \to \infty} \mathbf{x}^{\star}_{\nu_{k_j}} = \bar{\mathbf{x}} $$ Then, by continuity of $f$, $$ \begin{align*} f(\bar{\mathbf{x}}) & = \lim_{j \to \infty} f(\mathbf{x}^{\star}_{\nu_{k_j}}) \newline & \leq \lim_{j \to \infty} f(\mathbf{x}^{\star}_{\nu_{k_j}}) + \underbrace{\nu_{k_j} \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu_{k_j}})}_{\geq 0} \newline & \leq f(\mathbf{x}^{\star}) = f(\mathbf{x}^{\star}) \end{align*} $$ This shows that $f(\bar{\mathbf{x}}) = f(\mathbf{x}^{\star})$ and since $\bar{\mathbf{x}} \in S$, we must have in fact that $f(\bar{\mathbf{x}}) = f(\mathbf{x}^{\star})$. Thus, $\bar{\mathbf{x}}$ is global optimal in the original problem $_\blacksquare$
Theorem: Convergence to KKT Points of SUMT
Let $f, g_i (i = 1, \ldots, m), h_j (j = 1, \ldots, l)$ be $C^1$ functions. Further, assume that the penalty function $\psi$ is $C^1$ and that $\psi^{\prime}(s) \geq 0 \ \forall s \geq 0$. Consider a sequence $\nu_k \to +\infty$, $$ \begin{align*} \left. \begin{array}{l} \mathbf{x}_k \text{ stationary in unconstrained problem with } \nu_k \newline \mathbf{x}_k \to \hat{\mathbf{x}} \text{ as } k \to +\infty \newline \text{LICQ holds at } \hat{\mathbf{x}} \newline \hat{\mathbf{x}} \text{ feasible in original problem} \newline \end{array} \right\} \Longrightarrow \hat{\mathbf{x}} \text{ stationary (KKT) in (1)} \end{align*} $$
Note
From the proof, we obtain estimates of the Lagrange multipliers, the optimality conditions of the constrained problem gives that, $$ \begin{align*} \mu_i^{\star} & \approx \nu_k \psi^{\prime}(\max\{0, g_i(\mathbf{x}_k)\}) \newline \lambda_j^{\star} & \approx \nu_k \psi^{\prime}(h_j(\mathbf{x}_k)) \end{align*} $$
Note: Computational Considerations for SUMT
- $\nu \text{ large } \implies f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x})$ is ill-conditioned and difficult to minimize.
- If we increase $\nu$ slowly, a good guess is that $\mathbf{x}^{\star}_{\nu_k} \approx \mathbf{x}^{\star}_{\nu_{k-1}}$.
- However, this guess can be improved.
Interior Penalty Methods
Definition: Interior Penalty Functions
Consider inequality constrained optimization, $$ \underset{\mathbf{x} \in S}{\min} \ f(\mathbf{x}), \quad S = \{\mathbf{x} \in \mathbb{R}^n \mid g_i(\mathbf{x}) \leq 0, \ i = 1, \ldots, m \} $$ Assume strictly feasible points exists $\bar{\mathbf{x}} \in \mathbb{R}^n$ such that $g_i(\bar{\mathbf{x}}) < 0, \ i = 1, \ldots, m$.
We say that interior penalty (barrier) methods approximates $S$ from the inside.
Further, if a globally optimal solution to this problem is on the boundary of the feasible region, the method generates a sequence of interior points that converge to it.
Furthermore, we can approximate $\chi_S(\mathbf{x})$ from above by, $$ \chi_S(\mathbf{x}) \leq \nu \widehat{\chi}_S(\mathbf{x}) \coloneqq \begin{cases} \nu \sum_{i=1}^m \phi(g_i(\mathbf{x})), & \text{if } g_i(\mathbf{x}) < 0, \ i = 1, \ldots, m \newline +\infty, & \text{otherwise} \newline \end{cases} $$ where $\phi : \mathbb{R}_- \mapsto \mathbb{R}_+$ is continuous and $\lim_{s_k < 0, s_k \to 0} \phi(s_k) = +\infty$. Typical choices are $\phi(s) = -s^{-1}$ or $\phi(s) = -\log([\min\{1, -s\}])$ and the differentiable logarithmic barrier function $\phi(s) = -\log(-s)$.
Again, since $g_i$ are convex, $phi$ is convex and non-increasing, it means that $\widehat{\chi}_S$ is also convex.
Theorem: Optimality of Limit Points of Interior Penalty Methods
Consider the penalty problem, $$ \min \ f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x}) $$ global optimal solutions to this problem is also global optimal solutions to the original problem.
Further, the convergence of stationary point also holds, $$ \begin{align*} \left. \begin{array}{l} \mathbf{x}_k \text{ stationary in unconstrained problem with } \nu_k \newline \mathbf{x}_k \to \hat{\mathbf{x}} \text{ as } k \to +\infty \newline \text{LICQ holds at } \hat{\mathbf{x}} \newline \end{array} \right\} \Longrightarrow \hat{\mathbf{x}} \text{ stationary (KKT) in (1)} \end{align*} $$
Polynomial Methods for LPs
Definition: Polynomial Interior Point Methods for Linear Programming
Consider the LP, $$ \begin{align*} \min \ & -\mathbf{b}^T \mathbf{y} \newline \text{subject to} \ & A^T \mathbf{y} + \mathbf{s} = \mathbf{c} \newline & \mathbf{s} \geq 0 \newline \end{align*} $$ and the corresponding KKT conditions, $$ \begin{align*} A^T \mathbf{y} + \mathbf{s} = \mathbf{c} \newline A \mathbf{x} = \mathbf{b} \newline \mathbf{x}^T \mathbf{s} = 0 \newline \mathbf{x}, \mathbf{s}, \geq 0 \newline \end{align*} $$ We can apply the barrier method to the LP, taking care of the $\mathbf{s} \geq 0$ constraint. Thus, the subproblem becomes, $$ \begin{align*} \min \ & -\mathbf{b}^T \mathbf{y} - \nu \sum_{j=1}^n \log(s_j) \newline \text{subject to} \ & A^T \mathbf{y} + \mathbf{s} = \mathbf{c} \newline \end{align*} $$ The KKT conditions for this problem are, $$ \begin{align*} A^T \mathbf{y} + \mathbf{s} = \mathbf{c} \newline A \mathbf{x} = \mathbf{b} \newline x_j s_j = \nu, \ j = 1, \ldots, n \newline \end{align*} $$
Example
Consider the constrained optimization problem, $$ \begin{align*} \min \ & -x_1^2 + x_2^2 \newline \text{subject to} \ & x_1^3 - x_2 = 0 \end{align*} $$ with the penalty function $\psi(s) = s^2$.