Introduction
Recall our constrained optimization problem of the form,
$$ (P) \quad \begin{cases} \min & f(\mathbf{x}) \newline \text{subject to} & \mathbf{x} \in S \end{cases} $$
where $f : \mathbb{R}^n \mapsto \mathbb{R}$ and $S \subseteq \mathbb{R}^n$.
We have previously derived optimality conditions of the form,
$$ ” \mathbf{x}^{\star} \text{ is a local minimum of } (P) \implies \text{There is no feasible descent direction at } \mathbf{x}^{\star}. ” $$
In this part, we will generalize to statements of the form,
$$ \mathbf{x}^{\star} \text{ is a local minimum of } (P) \implies \underbrace{A(\mathbf{x}^{\star})}_{\text{Set of descent directions}} \cap \underbrace{B(\mathbf{x}^{\star})}_{\text{Set of feasible directions}} = \emptyset, $$
where we will characterize $A(\mathbf{x}^{\star})$ and $B(\mathbf{x}^{\star})$ in different ways.
Tangent and Gradient Cones
To neatly characterize the set of feasible directions, we will introduce the concept of tangent and gradient cones.
But, let’s firstly define the cone of descent directions and cone of feasible directions.
Definition: Cone of Descent Directions
The cone of descent directions in $\mathbf{x} \in S$ is, $$ \overset{\circ}{F}(\mathbf{x}) \coloneqq \{ \mathbf{p} \in \mathbb{R}^n \mid \nabla f(\mathbf{x})^T \mathbf{p} < 0 \} $$
Definition: Cone of Feasible Directions
Let $S \subseteq \mathbb{R}^n$ be a non-empty and closed set. The cone of feasible directions at $\mathbf{x} \in S$ is, $$ R_S(\mathbf{x}) \coloneqq \{ \mathbf{p} \in \mathbb{R}^n \mid \exists \delta > 0, \ \mathbf{x} + \alpha \mathbf{p} \in S, \ \forall \alpha \in [0, \delta] \} $$
With these, we can rewrite our previous optimality condition as follows.
Theorem: Optimality Condition using Cones
$$ \mathbf{x}^{\star} \text{ is a local minimum of } (P) \implies \overset{\circ}{F}(\mathbf{x}^{\star}) \cap R_S(\mathbf{x}^{\star}) = \emptyset $$
Example: Cone of Feasible Directions
Let $S = \{\mathbf{x} \in \mathbb{R}^2 \mid x_2 = x_1^2 \}$
We can rewrite $S$ as $S = \{\mathbf{x} \in \mathbb{R}^2 \mid -x_1^2 + x_2 = 0 \}$.
Thus, $R_S(\mathbf{x}) = \emptyset$ for all $\mathbf{x} \in S$.
Now, we can introduce the tangent cone.
Tangent Cone
Definition: Tangent Cone
Let $S \subseteq \mathbb{R}^n$. The tangent cone at $\mathbf{x} \in S$ is, $$ \begin{align*} T_S(\mathbf{x}) & \coloneqq \{ \mathbf{p} \in \mathbb{R}^n \mid \exists \{\mathbf{x}_k\}_{k=1}^{\infty} \subseteq S, \exists \{\lambda_t\}_{t=1}^{\infty} \subset (0, \infty), \newline & \quad \quad \text{such that} \ \lim_{t \to \infty} \mathbf{x}_t = \mathbf{x}, \ \text{and} \newline & \quad \quad \lim_{k \to \infty} \lambda_k (\mathbf{x}_k - \mathbf{x}) = \mathbf{p} \} \end{align*} $$
Example: Tangent Cone
Let $S = \{\mathbf{x} \in \mathbb{R}^2 \mid x_2 = x_1^2 \}$ and consider the point $\mathbf{x} = \mathbf{0}$.
Thus, the tangent cone at $\mathbf{0}$ is, $$ T_S(\mathbf{0}) = \{ \mathbf{p} \in \mathbb{R}^2 \mid p_2 = 0 \} $$
Theorem: Geometric Optimality Conditions
Let $S \subseteq \mathbb{R}^n$ and $f \in C^1$ on $S$. Then, $$ \mathbf{x}^{\star} \text{ is a local minimum of } (P) \implies \overset{\circ}{F}(\mathbf{x}^{\star}) \cap T_S(\mathbf{x}^{\star}) = \emptyset $$
To get more useful conditions, we will need more concrete feasible regions.
Consider this form of optimization problems,
$$ (P) \quad \begin{cases} \min & f(\mathbf{x}) \newline \text{subject to} & g_i(\mathbf{x}) \leq 0, \ i = 1, \ldots, m \newline \end{cases} $$
where $f, g_i \in C^1$ for $i = 1, \ldots, m$. Thus, $S = \{\mathbf{x} \in \mathbb{R}^n \mid g_i(\mathbf{x}) \leq 0, \ i = 1, \ldots, m \}$.
Definition: Active Constraints
The set of active constraints at $\mathbf{x} \in S$ is, $$ \mathcal{I}(\mathbf{x}) \coloneqq \{ i \in \{1, \ldots, m\} \mid g_i(\mathbf{x}) = 0 \} $$
Example: Active Constraints
Let $g_i : \mathbb{R}^2 \to \mathbb{R}$ for $i = 1, 2, 3$ where, $$ \begin{align*} g_1(\mathbf{x}) & = -x_1 \newline g_2(\mathbf{x}) & = -x_2 \newline g_3(\mathbf{x}) & = x_1 + x_2 - 1 \end{align*} $$ For $\bar{\mathbf{x}} = \begin{bmatrix} 1 \newline 0 \end{bmatrix}$, we obtain, $$ \begin{align*} g_1(\bar{\mathbf{x}}) & = -1 < 0 \newline g_2(\bar{\mathbf{x}}) & = -0 = 0 \newline g_3(\bar{\mathbf{x}}) & = 1^2 + 0 - 1 = 0 \end{align*} $$ Thus, $\mathcal{I}(\bar{\mathbf{x}}) = \{2, 3\}$.
Definition: Inner Gradient Cone
a The inner gradient cone at $\mathbf{x} \in S$ is, $$ \overset{\circ}{G}(\mathbf{x}) \coloneqq \{ \mathbf{p} \in \mathbb{R}^n \mid \nabla g_i(\mathbf{x})^T \mathbf{p} < 0, \ \forall i \in \mathcal{I}(\mathbf{x}) \} $$
Definition: Gradient Cone
The gradient cone at $\mathbf{x} \in S$ is, $$ G(\mathbf{x}) \coloneqq \{ \mathbf{p} \in \mathbb{R}^n \mid \nabla g_i(\mathbf{x})^T \mathbf{p} \leq 0, \ \forall i \in \mathcal{I}(\mathbf{x}) \} $$
Note
The only difference between the inner gradient cone and the gradient cone is that the inequalities are strict in the inner gradient cone.
Theorem: Gradient Cone Relations
Let $S \subseteq \mathbb{R}^n$ be defined as above and let $\mathbf{x} \in S$. Then, $$ \overset{\circ}{G}(\mathbf{x}) \subseteq R_S(\mathbf{x}) \subseteq T_S(\mathbf{x}) \subseteq G(\mathbf{x}) $$
Lemma: Optimality Condition using Gradient Cone
$$ \mathbf{x}^{\star} \text{ is a local minimum of } (P) \implies \overset{\circ}{F}(\mathbf{x}^{\star}) \cap \overset{\circ}{G}(\mathbf{x}^{\star}) = \emptyset $$
Abadie’s Constraint Qualifications and Karush-Kuhn-Tucker Conditions
With all these definitions, we can now introduce Abadie’s constraint qualifications (and later the Karush-Kuhn-Tucker conditions).
Definition: Abadie’s Constraint Qualifications
Abadie’s constraint qualifications is said to hold in $\mathbf{x} \in S$ if, $$ T_S(\mathbf{x}) = G(\mathbf{x}) $$ In other words, Abadie’s constraint qualifications holds in $\mathbf{x} \in S$ if the tangent cone and the gradient cone are equal.
Example: Gradient Cone
Let $S = \{\mathbf{x} \in \mathbb{R}^2 \mid \underbrace{(x_1 - 1)^2 + x_2^2}_{g_1(\mathbf{x})} \leq 1, \ \underbrace{(x_1 + 1)^2 + x_2^2}_{g_2(\mathbf{x})} \leq 1 \}$ and consider the point $\mathbf{x} = \begin{bmatrix} 0 & 0 \end{bmatrix}^T$. Compute $R_S(\mathbf{x})$, $T_S(\mathbf{x})$, $\overset{\circ}{G}(\mathbf{x})$ and $G(\mathbf{x})$ and determine if Abadie’s constraint qualifications holds in $\mathbf{x}$.
Firstly, we rewrite our constraints to normal form, $$ \begin{align*} g_1(\mathbf{x}) & = (x_1 - 1)^2 + x_2^2 - 1 \leq 0 \newline g_2(\mathbf{x}) & = (x_1 + 1)^2 + x_2^2 - 1 \leq 0 \end{align*} $$ Since we are working in $\mathbb{R}^2$, we can easily visualize the problem.
Thus, we know that there are no feasible directions, i.e., $R_S(\mathbf{x}) = \emptyset$.
Since we only have one point, the only allowed sequence that converges to $\mathbf{x}$ is the constant sequence $\mathbf{x}_k = \mathbf{x}$ for all $k$, i.e., $T_S(\mathbf{x}) = \{ \mathbf{0} \}$, the zero vector.
Now, we compute the gradients of the constraints, $$ \begin{align*} \nabla g_1(\mathbf{x}) & = \begin{bmatrix} 2(x_1 - 1) \newline 2x_2 \end{bmatrix} = \begin{bmatrix} -2 \newline 0 \end{bmatrix} \newline \nabla g_2(\mathbf{x}) & = \begin{bmatrix} 2(x_1 + 1) \newline 2x_2 \end{bmatrix} = \begin{bmatrix} 2 \newline 0 \end{bmatrix} \end{align*} $$ Since both constraints are active at $\mathbf{x}$, we have $\mathcal{I}(\mathbf{x}) = \{1, 2\}$. Thus, we can compute the inner gradient cone and the gradient cone, $$ \begin{align*} \overset{\circ}{G}(\mathbf{x}) & = \{ \mathbf{p} \in \mathbb{R}^2 \mid \begin{bmatrix}-2 \newline 0 \end{bmatrix}^T \mathbf{p} < 0, \ \begin{bmatrix}2 \newline 0 \end{bmatrix}^T \mathbf{p} < 0 \} = \{ \mathbf{p} \in \mathbb{R}^2 \mid p_1 > 0, \ p_1 < 0 \} = \emptyset \newline G(\mathbf{x}) & = \{ \mathbf{p} \in \mathbb{R}^2 \mid \begin{bmatrix}-2 \newline 0 \end{bmatrix}^T \mathbf{p} \leq 0, \ \begin{bmatrix}2 \newline 0 \end{bmatrix}^T \mathbf{p} \leq 0 \} = \{ \mathbf{p} \in \mathbb{R}^2 \mid p_1 \geq 0, \ p_1 \leq 0 \} = \{ \mathbf{p} \in \mathbb{R}^2 \mid p_1 = 0 \} \end{align*} $$ Finally, we can say that Abadie’s constraint qualifications does not hold in $\mathbf{x}$ since, $$ T_S(\mathbf{x}) = \{ \mathbf{0} \} \neq \{ \mathbf{p} \in \mathbb{R}^2 \mid p_1 = 0 \} = G(\mathbf{x}) $$
Theorem: Karush-Kuhn-Tucker (KKT) Conditions
Assume that Abadie’s constraint qualifications holds in $\mathbf{x}^{\star} \in S$. Then, $$ \begin{align*} \mathbf{x}^{\star} \text{ is a local minimum of } (P) & \implies \text{The following system is solvable for } \mu, \newline & \quad \quad \begin{cases} \nabla f(\mathbf{x}^{\star}) + \sum_i^m \mu_i \nabla g_i(\mathbf{x}^{\star}) = 0 \newline \mu_i g_i(\mathbf{x}^{\star}) = 0, \ i = 1, \ldots, m \newline \mu_i \geq 0, \ i = 1, \ldots, m \end{cases} \end{align*} $$
To prove the KKT conditions, we will need Farkas’ lemma (which is a theorem, but for historical reasons is called a lemma).
Theorem: Farkas’ Lemma
Let $A \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$. Then, exactly one of the following systems is solvable, $$ \begin{cases} A \mathbf{x} = \mathbf{b} \newline \mathbf{x} \geq 0 \end{cases} \quad \quad \begin{cases} A^T \mathbf{y} \leq 0 \newline \mathbf{b}^T \mathbf{y} > 0 \end{cases} $$
We will not prove this now, we will prove this later on, more specifically in a LP setting.
Proof: Karush-Kuhn-Tucker (KKT) Conditions
We know that, $$ \begin{align*} \mathbf{x}^{\star} \text{ is a local minimum of } (P) & \implies \overset{\circ}{F}(\mathbf{x}^{\star}) \cap T_S(\mathbf{x}^{\star}) = \emptyset \newline & \iff \overset{\circ}{F}(\mathbf{x}^{\star}) \cap G(\mathbf{x}^{\star}) = \emptyset \ \text{(By Abadie’s)} \newline & \implies \begin{cases} \nabla f(\mathbf{x}^{\star})^T \mathbf{p} < 0 \newline \nabla g_i(\mathbf{x}^{\star})^T \mathbf{p} \leq 0, \ i \in \mathcal{I}(\mathbf{x}^{\star}) \end{cases} \newline & \text{we notice that if we let,} \newline & A^T = \begin{bmatrix} \nabla g_i(\mathbf{x}^{\star}) \end{bmatrix}_{i \in \mathcal{I}(\mathbf{x}^{\star})}, \newline & \mathbf{b} = -\nabla f(\mathbf{x}^{\star}), \newline & \text{ we can use Farkas’ lemma} \newline & \implies \begin{cases} A \mathbf{x} = \mathbf{b} \newline \mathbf{x} \geq 0 \end{cases} \newline & \text{We now index } \mathbf{x} \text{ as } \mathbf{x} = \begin{bmatrix} x_i \end{bmatrix}_{i \in \mathcal{I}(\mathbf{x}^{\star})}, \newline & \text{and let } \mu_i = x_i \text{ for } i \in \mathcal{I}(\mathbf{x}^{\star}) \text{ and } \mu_i = 0 \text{ for } i \notin \mathcal{I}(\mathbf{x}^{\star}) \newline & \implies \begin{cases} \nabla f(\mathbf{x}^{\star}) + \sum_i^m \mu_i \nabla g_i(\mathbf{x}^{\star}) = 0 \newline \mu_i g_i(\mathbf{x}^{\star}) = 0, \ i = 1, \ldots, m \newline \mu_i \geq 0, \ i = 1, \ldots, m \newline \end{cases} \newline & \text{which are precisely the KKT conditions.} \ _\blacksquare \end{align*} $$
Note
- If $g_i(\mathbf{x}^{\star}) = 0$ (i.e., the constraint is active), then $\mu_i$ can be any non-negative value, i.e., $\mu_i \geq 0$.
- If $g_i(\mathbf{x}^{\star}) < 0$ (i.e., the constraint is inactive), then $\mu_i = 0$.
- $-\nabla f(\mathbf{x}^{\star})$ should be a non-negative linear combination of the gradients of the active constraints $\nabla g_i(\mathbf{x}^{\star})$ for $i \in \mathcal{I}(\mathbf{x}^{\star})$.
- We can interpret the above as a generalization of the normal cone, since we assumed convexity, here we do not assume convexity.
Definition: KKT Point
A KKT point is a point $\mathbf{x} \in S$, where the KKT system is solvable for some $\mu$.
Example: KKT Point
Consider the following optimization problem, $$ \begin{cases} \min & -x_1 + x_2 \newline \text{subject to} & x_1^2 + x_2^2 - 1 \leq 0 \newline & -x_2 \leq 0 \end{cases} $$ Which points are KKT points?
(a) $\mathbf{x}_1 = \begin{bmatrix} 0 & 0 \end{bmatrix}^T$ (b) $\mathbf{x}_2 = \begin{bmatrix} -1 & 0 \end{bmatrix}^T$ (c) $\mathbf{x}_3 = \begin{bmatrix} 1 & 0 \end{bmatrix}^T$
The next step is to calculate the gradients of the cost and constraints, $$ \begin{align*} \nabla f(\mathbf{x}) & = \begin{bmatrix} -1 \newline 1 \end{bmatrix} \newline \nabla g_1(\mathbf{x}) & = \begin{bmatrix} 2x_1 \newline 2x_2 \end{bmatrix} \newline \nabla g_2(\mathbf{x}) & = \begin{bmatrix} 0 \newline -1 \end{bmatrix} \end{align*} $$
As we stated previously, the (negative) gradient of the cost should be a non-negative linear combination of the gradients of the active constraints.
We can see that for $\mathbf{x}_2$, the negative gradient can be expressed as a linear combination, but not a non-negative linear combination, thus $\mathbf{x}_2$ is not a KKT point.
For $\mathbf{x}_1$, since only one constraint is active, we can see that the negative gradient cannot be expressed as a linear combination of the gradient of the active constraint, thus $\mathbf{x}_1$ is not a KKT point.
However, for $\mathbf{x}_3$, we can express the negative gradient as a non-negative linear combination of the gradients of the active constraints, thus $\mathbf{x}_3$ is a KKT point, or, $$ \begin{cases} \nabla f(\mathbf{x}_3) + \mu_1 \nabla g_1(\mathbf{x}_3) + \mu_2 \nabla g_2(\mathbf{x}_3) = 0 \newline \begin{bmatrix} -1 \newline 1 \end{bmatrix} + \mu_1 \begin{bmatrix} 2 \newline 0 \end{bmatrix} + \mu_2 \begin{bmatrix} 0 \newline -1 \end{bmatrix} = 0 \newline \implies \begin{cases} -1 + 2\mu_1 = 0 \newline 1 - \mu_2 = 0 \end{cases} \newline \implies \boxed{\begin{cases} \mu_1 = \frac{1}{2} \newline \mu_2 = 1 \end{cases}} \newline \end{cases} $$ Thus, the only KKT point is $\mathbf{x}_3 = \begin{bmatrix} 1 & 0 \end{bmatrix}^T$ with multipliers $\mu_1 = \frac{1}{2}$ and $\mu_2 = 1$.