Part 12 - Constrained Optimization & Penalty Methods

Introduction

In this part we will cover constrained optimization problems and penalty methods to solve them. Further, we will also see some examples of algorithms that use penalty methods to solve constrained optimization problems.

Penalty Functions

Definition 1 ((Naïve) Penalty Function)

Consider the optimization problem,

(PE){min f(x)subject to xS\begin{align*} (P^E) \quad & \begin{cases} \min \ & f(\mathbf{x}) \newline \text{subject to} \ & \mathbf{x} \in S \newline \end{cases} \end{align*}

where SRnS \subset \mathbb{R}^n is non-empty, closed, and f:RnRf : \mathbb{R}^n \mapsto \mathbb{R} is differentiable.

We introduce a penalty function to the problem to convert it to an unconstrained optimization problem,

min f(x)+χS(x),\begin{align*} \min \ f(\mathbf{x}) + \chi_S(\mathbf{x}), \end{align*}

where,

χS(x)={0,xS+,xS\chi_S(\mathbf{x}) = \begin{cases} 0, & \mathbf{x} \in S \newline +\infty, & \mathbf{x} \notin S \newline \end{cases}

is the indicator function of the set SS.

Note

Since feasibility is top priority, only when achieving feasibility should we focus on minimizing the objective function.

However, this is computationally bad, since the indicator function is non-differentiable, discontinuous, and not finite (although it is convex if SS is convex).

Thus, a more logical approach is to use somekind of numerical “warning” before becoming infeasible or near-infeasible, by approximating the indicator function with a numerically better behaving function.

Exterior Penalty Functions

Definition 2 ((SUMT) Sequential Unconstrained Minimization Techniques and Exterior Penalty Functions)

Suppose,

S={xRngi(x)0, i=1,,mhj(x)=0, j=1,,l}\begin{align*} S & = \{\mathbf{x} \in \mathbb{R}^n \mid g_i(\mathbf{x}) \leq 0, \ i = 1, \ldots, m \newline & \quad \quad h_j(\mathbf{x}) = 0, \ j = 1, \ldots, l \} \newline \end{align*}

Choose C0C^0 penalty function ψ:RnR+\psi : \mathbb{R}^n \mapsto \mathbb{R}_+ such that,

ψ(s)=0    s=0\psi(s) = 0 \iff s = 0

typical choices are ψ(s)=s2\psi(s) = s^2 or ψ(s)=s\psi(s) = |s|.

Thus, we can define an approximate indicator function as,

χS(x)νχ^S(x)ν(i=1mψ(max{0,gi(x)})+j=1lψ(hj(x)))\chi_S(\mathbf{x}) \approx \nu \widehat{\chi}_S(\mathbf{x}) \coloneqq \nu \left( \sum_{i=1}^m \psi(\max\{0, g_i(\mathbf{x})\}) + \sum_{j=1}^l \psi(h_j(\mathbf{x})) \right)

νχ^S(x)\nu \widehat{\chi}_S(\mathbf{x}) approximates χS(x)\chi_S(\mathbf{x}) from below, i.e., νχ^S(x)χS(x)\nu \widehat{\chi}_S(\mathbf{x}) \leq \chi_S(\mathbf{x}) for all xRn\mathbf{x} \in \mathbb{R}^n.

Further, ν>0\nu > 0 is a penalty parameter that controls the weight of the penalty term.

Finally, the unconstrained problem,

(PνE)min f(x)+νχ^S(x)\begin{align*} (P^E_\nu) \quad & \min \ & f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x}) \newline \end{align*}

is called the penalty transformed problem.

Note

Firstly, we note that (PνE)(P^E_\nu) is a relaxation of (PE)(P^E). Consider the increasing sequence {νk}\{\nu_k\} with limkνk=+\lim_{k \to \infty} \nu_k = +\infty.

The corresponding sequence of approximate problems,

minxRn f(x)+νkχ^S(x)\begin{align*} \underset{\mathbf{x} \in \mathbb{R}^n}{\min} \ f(\mathbf{x}) + \nu_k \widehat{\chi}_S(\mathbf{x}) \end{align*}

with optimal solutions xνk\mathbf{x}^{\star}_{\nu_k}, is called a Sequential Unconstrained Minimization Technique (SUMT).

If {xνk}\{\mathbf{x}^{\star}_{\nu_k}\} has a limit point x^\hat{\mathbf{x}}, then x^\hat{\mathbf{x}} is an optimal solution to the original constrained problem.

Let x\mathbf{x}^{\star} be an optimal solution to,

minxS f(x)\underset{\mathbf{x} \in S}{\min} \ f(\mathbf{x})

For any ν>0\nu > 0, let xν\mathbf{x}^{\star}_{\nu} be an optimal solution to,

minxRn f(x)+νχ^S(x)\underset{\mathbf{x} \in \mathbb{R}^n}{\min} \ f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x})

Thus, the lower bound on f(x)f(\mathbf{x}^{\star}) is given by,

f(xν)+νχ^S(xν)f(x)+νχ^S(x)=0=f(x), ν>0f(\mathbf{x}^{\star}_{\nu}) + \nu \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu}) \leq f(\mathbf{x}^{\star}) + \nu \underbrace{\widehat{\chi}_S(\mathbf{x}^{\star})}_{=0} = f(\mathbf{x}^{\star}), \ \forall \nu > 0

Further, since our original unconstrained problem and the penalty function are convex (and non-decreasing), it means that our constrained problem is also convex.

Theorem 1 (Optimality of Limit Points of SUMT)

Assume global optimal solution exists in original problem.

minxS f(x)\underset{\mathbf{x} \in S}{\min} \ f(\mathbf{x})

For any ν>0\nu > 0, let xν\mathbf{x}^{\star}_{\nu} be an optimal solution to,

minxRn f(x)+νχ^S(x)\underset{\mathbf{x} \in \mathbb{R}^n}{\min} \ f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x})

Then, xˉ\bar{\mathbf{x}}, the limit point of {xνk}\{\mathbf{x}^{\star}_{\nu_k}\}, as ν\nu \to \infty, is an optimal solution to the original problem.

Proof

To prove this theorem, we will need the following lemma.

Lemma 1

With the same assumptions as in the above theorem, for all ν1,ν2>0\nu_1, \nu_2 > 0, such that ν1ν2\nu_1 \leq \nu_2, we have that,

f(xν1)f(xν2)f(\mathbf{x}^{\star}_{\nu_1}) \leq f(\mathbf{x}^{\star}_{\nu_2})

For all ν>0\nu > 0, (PνE)(P^E_\nu) is a relaxation of (PE)(P^E). Thus, from the relaxation theorem,

f(xνk)+νkχ^S(xνk)f(x)f(\mathbf{x}^{\star}_{\nu_k}) + \nu_k \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu_k}) \leq f(\mathbf{x}^{\star})

Therefore,

0νkχ^S(xνk)f(x)f(xνk)0 \leq \nu_k \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu_k}) \leq f(\mathbf{x}^{\star}) - f(\mathbf{x}^{\star}_{\nu_k})

By the lemma and the lower bound,

0f(x)f(xν1)0 \leq f(\mathbf{x}^{\star}) - f(\mathbf{x}^{\star}_{\nu_1})

Thus, if limkχ^S(xνk)=0\lim_{k \to \infty} \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu_k}) = 0, since the term is squeezed between two finite values.

By continuity of χ^S\widehat{\chi}_S, every limit point of {xνk}\{\mathbf{x}^{\star}_{\nu_k}\} is in SS.

Further, let xˉ\bar{\mathbf{x}} be a limit point of {xνk}\{\mathbf{x}^{\star}_{\nu_k}\}, i.e., there exists a subsequence such that,

limkxνkj=xˉ\lim_{k \to \infty} \mathbf{x}^{\star}_{\nu_{k_j}} = \bar{\mathbf{x}}

Then, by continuity of ff,

f(xˉ)=limjf(xνkj)limjf(xνkj)+νkjχ^S(xνkj)0f(x)=f(x)\begin{align*} f(\bar{\mathbf{x}}) & = \lim_{j \to \infty} f(\mathbf{x}^{\star}_{\nu_{k_j}}) \newline & \leq \lim_{j \to \infty} f(\mathbf{x}^{\star}_{\nu_{k_j}}) + \underbrace{\nu_{k_j} \widehat{\chi}_S(\mathbf{x}^{\star}_{\nu_{k_j}})}_{\geq 0} \newline & \leq f(\mathbf{x}^{\star}) = f(\mathbf{x}^{\star}) \end{align*}

This shows that f(xˉ)=f(x)f(\bar{\mathbf{x}}) = f(\mathbf{x}^{\star}) and since xˉS\bar{\mathbf{x}} \in S, we must have in fact that f(xˉ)=f(x)f(\bar{\mathbf{x}}) = f(\mathbf{x}^{\star}). Thus, xˉ\bar{\mathbf{x}} is global optimal in the original problem _\blacksquare

Theorem 2 (Convergence to KKT Points of SUMT)

Let f,gi(i=1,,m),hj(j=1,,l)f, g_i (i = 1, \ldots, m), h_j (j = 1, \ldots, l) be C1C^1 functions. Further, assume that the penalty function ψ\psi is C1C^1 and that ψ(s)0 s0\psi^{\prime}(s) \geq 0 \ \forall s \geq 0. Consider a sequence νk+\nu_k \to +\infty,

xk stationary in unconstrained problem with νkxkx^ as k+LICQ holds at x^x^ feasible in original problem}x^ stationary (KKT) in (1)\begin{align*} \left. \begin{array}{l} \mathbf{x}_k \text{ stationary in unconstrained problem with } \nu_k \newline \mathbf{x}_k \to \hat{\mathbf{x}} \text{ as } k \to +\infty \newline \text{LICQ holds at } \hat{\mathbf{x}} \newline \hat{\mathbf{x}} \text{ feasible in original problem} \newline \end{array} \right\} \Longrightarrow \hat{\mathbf{x}} \text{ stationary (KKT) in (1)} \end{align*}
Note

From the proof, we obtain estimates of the Lagrange multipliers, the optimality conditions of the constrained problem gives that,

μiνkψ(max{0,gi(xk)})λjνkψ(hj(xk))\begin{align*} \mu_i^{\star} & \approx \nu_k \psi^{\prime}(\max\{0, g_i(\mathbf{x}_k)\}) \newline \lambda_j^{\star} & \approx \nu_k \psi^{\prime}(h_j(\mathbf{x}_k)) \end{align*}
Note (Computational Considerations for SUMT)
  1. ν large     f(x)+νχ^S(x)\nu \text{ large } \implies f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x}) is ill-conditioned and difficult to minimize.
  2. If we increase ν\nu slowly, a good guess is that xνkxνk1\mathbf{x}^{\star}_{\nu_k} \approx \mathbf{x}^{\star}_{\nu_{k-1}}.
  3. However, this guess can be improved.

Interior Penalty Methods

Definition 3 (Interior Penalty Functions)

Consider inequality constrained optimization,

minxS f(x),S={xRngi(x)0, i=1,,m}\underset{\mathbf{x} \in S}{\min} \ f(\mathbf{x}), \quad S = \{\mathbf{x} \in \mathbb{R}^n \mid g_i(\mathbf{x}) \leq 0, \ i = 1, \ldots, m \}

Assume strictly feasible points exists xˉRn\bar{\mathbf{x}} \in \mathbb{R}^n such that gi(xˉ)<0, i=1,,mg_i(\bar{\mathbf{x}}) < 0, \ i = 1, \ldots, m.

We say that interior penalty (barrier) methods approximates SS from the inside.

Further, if a globally optimal solution to this problem is on the boundary of the feasible region, the method generates a sequence of interior points that converge to it.

Furthermore, we can approximate χS(x)\chi_S(\mathbf{x}) from above by,

χS(x)νχ^S(x){νi=1mϕ(gi(x)),if gi(x)<0, i=1,,m+,otherwise\chi_S(\mathbf{x}) \leq \nu \widehat{\chi}_S(\mathbf{x}) \coloneqq \begin{cases} \nu \sum_{i=1}^m \phi(g_i(\mathbf{x})), & \text{if } g_i(\mathbf{x}) < 0, \ i = 1, \ldots, m \newline +\infty, & \text{otherwise} \newline \end{cases}

where ϕ:RR+\phi : \mathbb{R}_- \mapsto \mathbb{R}_+ is continuous and limsk<0,sk0ϕ(sk)=+\lim_{s_k < 0, s_k \to 0} \phi(s_k) = +\infty. Typical choices are ϕ(s)=s1\phi(s) = -s^{-1} or ϕ(s)=log([min{1,s}])\phi(s) = -\log([\min\{1, -s\}]) and the differentiable logarithmic barrier function ϕ(s)=log(s)\phi(s) = -\log(-s).

Again, since gig_i are convex, phiphi is convex and non-increasing, it means that χ^S\widehat{\chi}_S is also convex.

Theorem 3 (Optimality of Limit Points of Interior Penalty Methods)

Consider the penalty problem,

min f(x)+νχ^S(x)\min \ f(\mathbf{x}) + \nu \widehat{\chi}_S(\mathbf{x})

global optimal solutions to this problem is also global optimal solutions to the original problem.

Further, the convergence of stationary point also holds,

xk stationary in unconstrained problem with νkxkx^ as k+LICQ holds at x^}x^ stationary (KKT) in (1)\begin{align*} \left. \begin{array}{l} \mathbf{x}_k \text{ stationary in unconstrained problem with } \nu_k \newline \mathbf{x}_k \to \hat{\mathbf{x}} \text{ as } k \to +\infty \newline \text{LICQ holds at } \hat{\mathbf{x}} \newline \end{array} \right\} \Longrightarrow \hat{\mathbf{x}} \text{ stationary (KKT) in (1)} \end{align*}

Polynomial Methods for LPs

Definition 4 (Polynomial Interior Point Methods for Linear Programming)

Consider the LP,

min bTysubject to ATy+s=cs0\begin{align*} \min \ & -\mathbf{b}^T \mathbf{y} \newline \text{subject to} \ & A^T \mathbf{y} + \mathbf{s} = \mathbf{c} \newline & \mathbf{s} \geq 0 \newline \end{align*}

and the corresponding KKT conditions,

ATy+s=cAx=bxTs=0x,s,0\begin{align*} A^T \mathbf{y} + \mathbf{s} = \mathbf{c} \newline A \mathbf{x} = \mathbf{b} \newline \mathbf{x}^T \mathbf{s} = 0 \newline \mathbf{x}, \mathbf{s}, \geq 0 \newline \end{align*}

We can apply the barrier method to the LP, taking care of the s0\mathbf{s} \geq 0 constraint. Thus, the subproblem becomes,

min bTyνj=1nlog(sj)subject to ATy+s=c\begin{align*} \min \ & -\mathbf{b}^T \mathbf{y} - \nu \sum_{j=1}^n \log(s_j) \newline \text{subject to} \ & A^T \mathbf{y} + \mathbf{s} = \mathbf{c} \newline \end{align*}

The KKT conditions for this problem are,

ATy+s=cAx=bxjsj=ν, j=1,,n\begin{align*} A^T \mathbf{y} + \mathbf{s} = \mathbf{c} \newline A \mathbf{x} = \mathbf{b} \newline x_j s_j = \nu, \ j = 1, \ldots, n \newline \end{align*}
Example 1

Consider the constrained optimization problem,

min x12+x22subject to x13x2=0\begin{align*} \min \ & -x_1^2 + x_2^2 \newline \text{subject to} \ & x_1^3 - x_2 = 0 \end{align*}

with the penalty function ψ(s)=s2\psi(s) = s^2.