AA/EE/ME 578 Review
Convex Sets
Examples
Subspaces
$S \subseteq \mathbb{R}^{n}$ is a subspace if
$$ x, y \in S, \quad \lambda, \mu \in \mathbb{R} \implies \lambda x+\mu y \in S $$Geometrically: $x, y \in S \to$ plane through $0, x, y \subseteq S$.
Affine Sets
$S \subseteq \mathbb{R}^n$ is affine if
$$ x, y \in S, \quad \lambda, \mu \in \mathbb{R}, \quad \lambda+\mu=1 \implies \lambda x+\mu y \in S $$Geometrically: $x, y \in S \to$ line through $x, y \subseteq S$.
Convex Sets
$S \subseteq \mathbb{R}^n$ is a convex set if
$$ x, y \in S, \quad \lambda, \mu \ge 0, \quad \lambda+\mu=1 \implies \lambda x+\mu y \in S $$Geometrically: $x, y \in S \to$ segment $[x, y] \subseteq S$.
Convex Cone
$S \subseteq \mathbb{R}^n$ is a cone if
$$ x \in S, \quad \lambda \ge 0 \implies \lambda x \in S $$$S \subseteq \mathbb{R}^n$ is a convex cone if
$$ x, y \in S, \quad \lambda, \mu \ge 0 \implies \lambda x+\mu y \in S $$Geometrically: $x, y \in S \to$ ‘pie slice’ between $x, y \subseteq S$.
Combinations and Hulls
$y=\theta_{1} x_{1}+\dots+\theta_{k} x_{k}$ is a
- linear combination of $x_1, \dots, x_k$
- affine combination if $\sum \theta_i = 1$
- convex combination if $\sum \theta_i = 1,\ \theta_i \ge 0$
- conic combination if $\theta_i \ge 0$.
(linear, …) hull of $S$ is the set of all (linear, …) combinations from $S$.
$$ \DeclareMathOperator{\conv}{conv} \conv(S)=\bigcap\{G \mid S \subseteq G, \ G \text { convex }\} $$convex hull $\conv(S)$ is the set of all convex combinations of points in $S$.
Hyperplanes and Halfspaces
- Hyperplane: $\{x \mid a^{T} x=b,\ a \ne 0\}$
- Halfspace: $\{x \mid a^{T} x \le b,\ a \ne 0\}$
$a$ is the normal vector. Hyperplanes are affine and convex; halfspaces are convex.
Euclidean Balls and Ellipsoids
Euclidean ball with center $x_c$ and radius $r$:
$$ \require{mathtools} \DeclarePairedDelimiters\norm{\lVert}{\rVert} \DeclarePairedDelimiters\abs{\lvert}{\rvert} B(x_{c}, r)=\{x \mid \norm{x-x_{c}}_{2} \le r\}=\{x_{c}+r u \mid \norm{u}_{2} \le 1\} $$Ellipsoid:
$$ \{x \mid (x-x_{c})^{T} P^{-1}(x-x_{c}) \le 1\} $$with $P \in \mathbb{S}_{++}^{n}$. Other representation: $\{x_{c}+A u \mid \norm{u}_{2} \le 1\}$ with $A$ square and nonsingular.
Norm Balls and Norm Cones
Norm: a function $\norm{\cdot}$ that satisfies
- $\norm{x} \ge 0 ; \ \norm{x}=0$ if and only if $x=0$
- $\norm{t x}=\abs{t}\norm{x}$ for $t \in \mathbb{R}$
- $\norm{x+y} \le\norm{x}+\norm{y}$
Norm ball with center $x_c$ and radius $r$: $\{x \mid \norm{x-x_{c}} \le r\}$
Norm cone: $\{(x, t) \mid \norm{x} \le t\}$
Polyhedron
Solution set of finitely many linear inequalities and equalities
$$ A x \preceq b, \qquad C x=d $$($A \in \mathbb{R}^{m \times n},\ C \in \mathbb{R}^{p \times n},\ \preceq$ is component-wise inequality)
Polyhedron is intersection of finite number of halfspaces and hyperplanes.
Positive Semidefinite Cone
- $\mathbb{S}_{+}^{n}=\{X \in \mathbb{S}^{n} \mid X \succeq 0\}$: PSD matrices. A convex cone.
- $\mathbb{S}_{++}^{n}=\{X \in \mathbb{S}^{n} \mid X\succ 0\}$: PD matrices.
Operations That Preserve Convexity
To show $C$ is convex set
Definition
$$ x_{1}, x_{2} \in C, \quad 0 \le \theta \le 1 \implies \theta x_{1}+(1-\theta) x_{2} \in C $$Show that $C$ is obtained from simple convex sets by operations that preserve convexity
- intersection
- affine function
- perspective function
- linear-fractional functions
Intersection
The intersection of (any number of, even infinite) convex sets is convex.
Affine Function
Suppose $f : \mathbb{R}^{n} \to \mathbb{R}^{m}$ is affine: $f(x)=A x+b$ with $A \in \mathbb{R}^{m \times n}, \ b \in \mathbb{R}^{m}$
The image of a convex set under $f$ is convex
$$ S \subseteq \mathbb{R}^{n} \text { convex} \implies f(S)=\{f(x) \mid x \in S\} \text{ convex} $$The inverse image $f^{-1}(C)$ of a convex set under $f$ is convex
$$ C \subseteq \mathbb{R}^{m} \text{ convex} \implies f^{-1}(C)=\{x \in \mathbb{R}^{n} \mid f(x) \in C\} \text{ convex } $$
Perspective and Linear-Fractional Function
Perspective function $P : \mathbb{R}^{n+1} \to \mathbb{R}^{n}$
$$ \DeclareMathOperator*{\dom}{dom} P(x, t)=x / t, \quad \dom P=\{(x, t) \mid t>0\} $$images and inverse images of convex sets under perspective are convex.
Linear-fractional function $f : \mathbb{R}^{n} \to \mathbb{R}^{m}$
$$ f(x)=\frac{Ax+b}{c^{T} x+d}, \quad \text { dom } f=\{x \mid c^{T} x+d>0\} $$images and inverse images of convex sets under linear-fractional functions are convex.
Generalized Inequalities
A convex cone $K \subseteq \mathbb{R}^{n}$ is a proper cone if
- $K$ is closed (contains its boundary)
- $K$ is solid (has nonempty interior)
- $K$ is pointed (contains no line)
Generalized inequality defined by a proper cone $K$:
$$ x \preceq_{K} y \iff y-x \in K, \qquad x\preceq_{K} y \iff y-x \in \operatorname{int} K $$Hyperplane Theorem
Separating Hyperplane
If $C$ and $D$ are disjoint convex sets, then there exists $a \ne 0, \ b$ s.t.
$$ a^{T} x \le b,\ \forall x \in C; \quad a^{T} x \ge b,\ \forall x \in D $$the hyperplane $\{x \mid a^{T} x=b\}$ separates $C$ and $D$. Strict separation requires additional assumptions (e.g. $C$ is closed, $D$ is a singleton).
Supporting Hyperplane
Supporting hyperplane to set $C$ at boundary point $x_0$:
$$ \{x \mid a^{T} x=a^{T} x_{0}\} $$where $a \ne 0$ and $a^T x \le a^T x_0$ for all $x \in C$.
Supporting hyperplane theorem: If $C$ is convex, then there exists a supporting hyperplane at every boundary point of $C$.
Dual Cones
Dual cone of a cone $K$: $K^*=\{y \mid y^{T} x \ge 0, \ \forall x \in K\}$
Examples:
- $K=\mathbb{R}_{+}^{n} : K^{*}=\mathbb{R}_{+}^{n}$
- $K=\mathbb{S}_{+}^{n} : K^{*}=\mathbb{S}_{+}^{n}$
- $K=\{(x, t) \mid\norm{x}_{2} \le t\} : K^{*}=\{(x, t) \mid\norm{x}_{2} \le t\}$
- $K=\{(x, t) \mid\norm{x}_{1} \le t\} : K^{*}=\{(x, t) \mid\norm{x}_{\infty} \le t\}$
Convex Functions
$f: \mathbb{R}^{n} \to \mathbb{R}$ is convex if $\dom f$ is a convex set and
$$ f(\theta x+(1-\theta) y) \le \theta f(x)+(1-\theta) f(y) $$for all $x, y \in \dom f, \ 0\le \theta \le 1$
Examples
Convex
- affine: $a^T x + b$
- exponential: $e^{ax}$, for any $a \in \mathbb{R}$
- powers: $x^\alpha$ on $\mathbb{R}_{++}$, for $\alpha \ge 1$ or $a \le 0$
- powers of absolute value: $\abs{x}^p$ on $\mathbb{R}$, for $p \ge 1$
- negative entropy: $x\log x$ on $\mathbb{R}_{++}$
- norms: $\norm{x}_{p}=\left(\sum_{i=1}^{n}\left|x_{i}\right|^{p}\right)^{1 / p}$ for $p \ge 1$
- affine on matrices: $f(X)=\operatorname{tr}\left(A^{T} X\right)+b$
- spectral norm: $f(X) = \norm{X}_2 = \sigma_{\max}(X)$
- quadratic: $f(x)=(1 / 2) x^{T} P x+q^{T} x+r$ with $P \in \mathbb{S}^{n}$
- least-squares: $f(x) = \norm{Ax - b}_2^2$
- quadratic-over-linear: $f(x, y) = x^2/y$ with $y > 0$
- log-sum-exp: $f(x)=\log \sum_{k=1}^{n} e^{x_{k}}$
Concave
- $f$ in concave if $-f$ is convex
- affine
- powers: $x^\alpha$ on $\mathbb{R}_{++}$, for $0\le \alpha \le 1$
- logarithm: $\log x$ on $\mathbb{R}_{++}$
- $\log\det X$ on $\mathbb{S}_{++}^n$
- geometric mean: $f(x)=\left(\prod_{k=1}^{n} x_{k}\right)^{1 / n}$ on $\mathbb{R}_{++}^n$
Properties
Restriction of a Convex Function to a Line
$f : \mathbb{R}^{n} \to \mathbb{R}$ is convex iff the function $g : \mathbb{R}^{n} \to \mathbb{R}$
$$ g(t)=f(x+t v), \quad \dom g=\{t \mid x+t v \in \dom f\} $$is convex (in $t$) for any $x \in \dom f, \ v\in \mathbb{R}^n$
First-Order Convexity Condition
Differentiable $f$ with convex domain is convex iff
$$ f(y) \ge f(x)+\nabla f(x)^{T}(y-x),\quad \forall x, y \in \dom f $$Second-Order Convexity Condition
Twice differentiable $f$ with convex domain is convex iff
$$ \nabla^{2} f(x) \succeq 0, \quad \forall x \in \dom f $$Epigraph and Sublevel Set
$\alpha$-sublevel set of $f : \mathbb{R}^{n} \to \mathbb{R}$:
$$ C_{\alpha}=\{x \in \dom f \mid f(x) \le \alpha\} $$sublevel sets of convex functions are convex (converse is false)
Epigraph of $f : \mathbb{R}^{n} \to \mathbb{R}$:
$$ \operatorname{epi}f=\{(x, t) \in \mathbb{R}^{n+1} \mid x \in \dom f, \ f(x) \le t\} $$$f$ is convex iff $\operatorname{epi}f$ is a convex set.
Jensen’s Inequality
If $f$ is convex, then for $0 \le \theta \le 1$
$$ f(\theta x+(1-\theta) y) \le \theta f(x)+(1-\theta) f(y) $$can be extended to
$$ f(\mathbb{E}\ z) \le \mathbb{E}\ f(z) $$for any random variable $z$
Operations That Preserve Convexity
To show $f$ is convex function:
- Verify definition (often simplified by restricting to a line)
- For twice differentiable functions, show $\nabla^2 f(x) \succeq 0$
- Show that $f$ is obtained from simple convex functions by operations that preserve convexity
Nonnegative multiple: $\alpha f$ is convex if $f$ is convex, $\alpha \ge 0$
Sum: $f_1 + f_2$ is convex if $f_1, f_2$ convex
Composition with affine function: $f(Ax + b)$ is convex if $f$ is convex
Pointwise maximum: if $f_1, \dots, f_m$ are convex, then $f(x)=\max \{f_{1}(x), \dots, f_{m}(x)\}$ is convex
Pointwise supremum: if $f(x, y)$ is convex in $x$ for each $y\in C$, then $g(x)=\sup _{y \in C} f(x, y)$ is convex.
Composition of $g : \mathbb{R}^{n} \to \mathbb{R}$ and $h : \mathbb{R} \to \mathbb{R}$: $f(x) = h(g(x))$ is convex if
- $g$ convex, $h$ convex, $\tilde{h}$ nondecreasing
- $g$ concave, $h$ convex, $\tilde{h}$ nonincreasing
Composition of $g : \mathbb{R}^{n} \to \mathbb{R}^k$ and $h : \mathbb{R}^k \to \mathbb{R}$: $f(x) = h(g(x)) = h(g_1(x), \dots, g_k(x))$ is convex if
- $g_i$ convex, $h$ convex, $\tilde{h}$ nondecreasing in each argument
- $g_i$ concave, $h$ convex, $\tilde{h}$ nonincreasing in each argument
Minimization: if $f(x, y)$ is convex in $(x, y)$ and $C$ is convex set, then $g(x)=\inf_{y \in C} f(x, y)$ is convex
Perspective if a function $f : \mathbb{R}^{n} \to \mathbb{R}$ is the function $g : \mathbb{R}^{n} \times \mathbb{R} \to \mathbb{R}$
$$ g(x, t)=t f(x / t), \quad \dom g=\{(x, t) \mid x / t \in \dom f, t>0\} $$$g$ is convex if $f$ is convex
Conjugate Function
The conjugate of a function $f$: $f^{*}(y)=\sup_{x \in \dom f}\left(y^{T} x-f(x)\right)$ is always convex.
Quasiconvex Function
$f : \mathbb{R}^{n} \to \mathbb{R}$ is quasiconvex if $\dom f$ is convex and the sublevel sets
$$ S_{\alpha}=\{x \in \dom f \mid f(x) \le \alpha\} $$are convex for all $\alpha$
$f$ is quasiconcave if $-f$ is quasiconvex.
- Modified Jensen inequality: $0 \le \theta \le 1 \implies f(\theta x+(1-\theta) y) \le \max \{f(x), f(y)\}$
- First-order condition: $f(y) \le f(x) \implies \nabla f(x)^{T}(y-x) \le 0$
- Sums of quasiconvex functions are not necessarily quasiconvex
Log-Concave and Log-Convex Function
A positive function $f$ is log-concave if $\log f$ is concave:
$$ f(\theta x+(1-\theta) y) \ge f(x)^{\theta} f(y)^{1-\theta}, \quad \forall 0 \le \theta \le 1 $$Many common probability densities are log-concave, e.g., normal distribution.
- Second-order condition: $f(x) \nabla^{2} f(x) \preceq \nabla f(x) \nabla f(x)^{T}$
- Product of log-concave functions is log-concave
- Sum of log-concave functions is not always log-concave
- Integration: if $f : \mathbb{R}^{n} \times \mathbb{R}^{m} \to \mathbb{R}$ is log-concave, then $g(x)=\int f(x, y) dy$ is log-concave
Convex Optimization Problems
Optimization Problem
$$ \begin{align*} &\text{minimize} & &{f_{0}(x)} \\ &\text {subject to } & &{f_{i}(x) \le 0, \quad i=1, \ldots, m} \\ & & &{h_{i}(x)=0, \quad i=1, \ldots, p} \end{align*} $$- $x \in \mathbb{R}^n$ is the optimization variable
- $f_0: \mathbb{R}^n \to \mathbb{R}$ is the objective or cost function
- $f_i: \mathbb{R}^n \to \mathbb{R},\ i = 1, \dots, m$ are the inequality constraint functions
- $h_i: \mathbb{R}^n \to \mathbb{R}$ are the equality constraint functions
Optimal value $p^{\star}=\inf \{f_{0}(x) \mid f_{i}(x) \le 0,\ i=1, \dots, m,\ h_{i}(x)=0, \ i=1, \dots, p\}$
- $p^\star = \infty$ if problem is infeasible
- $p^\star = -\infty$ if problem is unbounded below
Feasibility Problem
$$ \begin{align*} &\text{minimize} & &0 \\ &\text{subject to} & &f_{i}(x) \le 0, \quad i=1, \dots, m \\ & & &h_{i}(x)=0, \quad i=1, \dots, p \end{align*} $$- $p^\star = 0$ if constraints are feasible; any feasible $x$ is optimal
- $p^\star = \infty$ if constraints are infeasible
Convex Optimization
$$ \begin{align*} &\text{minimize} & &f_{0}(x) \\ &\text {subject to} & &f_{i}(x) \le 0, \quad i=1, \dots, m \\ & & &Ax = b \end{align*} $$- $f_0, \dots, f_m$ are convex; equality constraints are affine
- Feasible set of a convex optimization problem is convex
- Any locally optimal point of a convex problem is globally optimal
Optimality Criterion
$x$ is optimal iff it is feasible and $\nabla f_{0}(x)^{T}(y-x) \ge 0$ for all feasible $y$. If nonzero, $\nabla f_0(x)$ defines a supporting hyperplane to feasible set $X$ at $x$.
unconstrained problem:
$$ x \in \dom f_{0}, \quad \nabla f_{0}(x)=0 $$equality constrained problem:
$$ \text{minimize } f_{0}(x) \quad \text{subject to } Ax=b $$x is optimal iff there exists a $\nu$ s.t.
$$ x \in \dom f_{0}, \quad A x=b, \quad \nabla f_{0}(x)+A^{T} \nu=0 $$minimization over nonnegative orthant:
$$ \text{minimize } f_{0}(x) \quad \text{subject to } x \succeq 0 $$x is optimal iff
$$ x \in \dom f_{0}, \quad x \succeq 0, \quad \begin{dcases} \nabla f_{0}(x)_{i} \ge 0 \quad x_{i}=0 \\ {\nabla f_{0}(x)_{i}=0} \quad x_{i}>0 \end{dcases} $$
Equivalent Convex Problems
eliminating equality constrains
$$ \begin{align*} &\text{minimize}_z & &f_{0}(F z+x_{0}) \\ &\text{subject to} & &f_{i}(F z+x_{0}) \le 0, \quad i=1, \dots, m \end{align*} $$where $F$ and $x_0$ s.t. $A x=b \iff x=F z+x_{0}$ for some $z$
introducing slack variables for linear inequalities
$$ \begin{align*} &\text{minimize} & &f_{0}(x) \\ &\text{subject to} & &a_{i}^{T} x \le b_{i}, \quad i=1, \dots, m \end{align*} $$is equivalent to
$$ \begin{align*} &\text{minimize}_{x, s} & &f_{0}(x) \\ &\text{subject to} & &a_{i}^{T} x+s_{i}=b_{i}, \quad i=1, \dots, m \\ & & &s_{i} \ge 0, \quad i=1, \dots m \end{align*} $$epigraph form: standard form convex problem is equivalent to
$$ \begin{align*} &\text{minimize}_{x, t} & &t \\ &\text{subject to} & &f_{0}(x)-t \le 0 \\ & & &f_{i}(x) \le 0, \quad i=1, \dots, m \\ & & &Ax=b \end{align*} $$
Quasiconvex Optimization
$$ \begin{align*} &\text{minimize} & &f_0(x) \\ &\text{subject to} & &f_{i}(x) \le 0, \quad i=1, \dots, m \\ & & &Ax=b \end{align*} $$with $f_0: \mathbb{R}^n \to \mathbb{R}$ quasiconvex, $f_1, \dots, f_m$ convex. Can have locally optimal points that are not globally optimal
If $f_0$ is quasiconvex, there exists a family of functions $\varphi_t$ s.t.
- $\varphi_t(x)$ is convex in $x$ for fixed $t$
- $t$-sublevel set of $f_0$ is $0$-sublevel set of $\varphi_t$
For a fixed $t$, the quasiconvex optimization problem can be transferred to a convex feasibility problem in $x$. Bisection method can be used to find the optimal $t$.
Linear Optimization
Linear Programming
$$ \begin{align*} &\text{minimize} & &c^{T} x+d \\ &\text{subject to} & &G x \preceq h \\ & & &A x=b \end{align*} $$- feasible set is a polyhedron.
Linear-Fractional Programming
$$ \begin{align*} &\text{minimize} & &f_{0}(x) \\ &\text{subject to} & &G x \preceq h \\ & & &A x=b \end{align*} $$where
$$ f_{0}(x)=\frac{c^{T} x+d}{e^{T} x+f}, \quad \dom f_{0}(x)=\{x \mid e^{T} x+f>0\} $$a quasiconvex optimization problem
equivalent to the LP
$$ \begin{align*} &\text{minimize} & &c^{T} y+d z \\ &\text{subject to} & &G y \preceq h z \\ & & &A y=b z \\ & & &e^{T} y+f z=1 \\ & & &z \ge 0 \end{align*} $$
Quadratic Optimization
Quadratic Programming
$$ \begin{align*} &\text{minimize} & &(1 / 2) x^{T} P x+q^{T} x+r \\ &\text{subject to} & &G x \preceq h \\ & & &A x=b \end{align*} $$- $P \in \mathbb{S}_+^n$, so objective is convex quadratic
- minimize a convex quadratic function over a polyhedron
Quadratically Constrained Quadratic Programming
$$ \begin{align*} &\text{minimize} & &(1 / 2) x^{T} P_{0} x+q_{0}^{T} x+r_{0} \\ &\text{subject to} & &(1 / 2) x^{T} P_{i} x+q_{i}^{T} x+r_{i} \le 0, \quad i=1, \dots, m \\ & & &A x=b \end{align*} $$- $P_i \in \mathbb{S}_+^n$, objective and constraints are convex quadratic
- if $P_1, \dots, P_m \in \mathbb{S}_{++}^n$, feasible region is intersection of $m$ ellipsoids and an affine set
Second-Order Cone Programming
$$ \begin{align*} &\text{minimize} & &f^{T} x \\ &\text{subject to} & &\norm*{A_{i} x+b_{i}}_{2} \le c_{i}^{T} x+d_{i}, \quad i=1, \dots, m \\ & & &F x=g \end{align*} $$- $A_{i} \in \mathbb{R}^{n_{i} \times n}, F \in \mathbb{R}^{p \times n}$
- inequalities are called second-order cone constraints
- for $n_i = 0$, reduces to an LP; if $c_i = 0$, reduces to a QCQP
Geometric Programming
Monomial function
$$ f(x)=c x_{1}^{a_{1}} x_{2}^{a_{2}} \cdots x_{n}^{a_{n}}, \quad \dom f=\mathbb{R}_{++}^{n} $$with $c>0,\ \alpha_i \in \mathbb{R}$
Posynomial function: sum of monomials
$$ f(x)=\sum_{k=1}^{K} c_{k} x_{1}^{a_{1 k}} x_{2}^{a_{2 k}} \cdots x_{n}^{a_{n k}}, \quad \dom f=\mathbb{R}_{++}^{n} $$Geometric program
$$ \begin{align*} &\text{minimize} & &f_{0}(x) \\ &\text{subject to} & &f_{i}(x) \le 1, \quad i=1, \dots, m \\ & & &h_{i}(x)=1, \quad i=1, \dots, p \end{align*} $$with $f_i$ posynomial, $h_i$ monomial
Geometric program in convex form:
$$ \begin{align*} &\text{minimize} & &\log \left(\sum_{k=1}^{K} \exp \left(a_{0 k}^{T} y+b_{0 k}\right)\right) \\ &\text{subject to} & &\log \left(\sum_{k=1}^{K} \exp \left(a_{i k}^{T} y+b_{i k}\right)\right) \le 0, \quad i=1, \dots, m \\ & & &G y+d=0 \end{align*} $$Generalized Inequality Constraints
$$ \begin{align*} &\text{minimize} & &f_{0}(x) \\ &\text{subject to} & &f_{i}(x) \preceq_{K_{i}} 0, \quad i=1, \dots, m \\ & & &A x=b \end{align*} $$- $f_0: \mathbb{R}^n \to \mathbb{R}$ convex; $f_i: \mathbb{R}^n \to \mathbb{R}^{k_i}$ $K_i$-convex w.r.t. proper cone $K_i$
- same properties as standard convex problem (convex feasible set, local optimum is global, etc.)
Conic form problem:
$$ \begin{align*} &\text{minimize} & &c^{T} x \\ &\text{subject to} & &F x+g \preceq_{K} 0 \\ & & &A x=b \end{align*} $$extends LP ($K = \mathbb{R}_+^m$) to non-polyhedral cones
Semidefinite programming:
$$ \begin{align*} &\text{minimize} & &c^{T} x \\ &\text{subject to} & &x_{1} F_{1}+x_{2} F_{2}+\cdots+x_{n} F_{n}+G \preceq 0 \\ & & &A x=b \end{align*} $$with $F_i, G \in \mathbb{S}^k$
Minimum and Minimal Elements
$\preceq_K$ is not in general a linear ordering: we can have $x \npreceq_K y$ and $y \npreceq_K x$
- $x \in S$ is the minimum element of $S$ w.r.t. $\preceq_K$ if $y \in S \implies x \preceq_K y$
- $x \in S$ is the minimal element of $S$ w.r.t. $\preceq_K$ if $y \in S, \ y \preceq_{K} x \implies y=x$
Vector Optimization
General vector optimization problem:
$$ \begin{align*} &\text{minimize}_K & &f_{0}(x) \\ &\text{subject to} & &f_{i}(x) \le 0, \quad i=1, \dots, m \\ & & &h_{i}(x) \le 0, \quad i=1, \dots, p \end{align*} $$vector objective $f_{0} : \mathbb{R}^{n} \to \mathbb{R}^{q}$, minimized w.r.t. proper cone $K \in \mathbb{R}^{q}$
Convex vector optimization problem:
$$ \begin{align*} &\text{minimize}_K & &f_{0}(x) \\ &\text{subject to} & &f_{i}(x) \le 0, \quad i=1, \dots, m \\ & & &A x=b \end{align*} $$with $f_0$ $K$-convex, $f_1,\dots, f_m$ convex
Optimal and Pareto Optimal Points
Set of achievable objective values $\mathcal{O}=\{f_{0}(x) \mid x \text{ feasible}\}$
- feasible $x$ is optimal if $f_0(x)$ is a minimum value of $\mathcal{O}$
- feasible $x$ is Pareto optimal if $f_0(x)$ is a minimal value of $\mathcal{O}$
Duality
Lagrange Dual Function
From the standard form optimization problem we define the Lagrangian $L : \mathbb{R}^{n} \times \mathbb{R}^{m} \times \mathbb{R}^{p} \to \mathbb{R}$
$$ L(x, \lambda, \nu)=f_{0}(x)+\sum_{i=1}^{m} \lambda_{i} f_{i}(x)+\sum_{i=1}^{p} \nu_{i} h_{i}(x) $$- weighted sum of objective and constraint functions
- $\lambda_i$ is Lagrange multiplier associated with $f_i(x) \le 0$
- $\nu_i$ is Lagrange multiplier associated with $h_i(x) = 0$
Lagrange dual function $g : \mathbb{R}^{m} \times \mathbb{R}^{p} \to \mathbb{R}$
$$ \begin{split} g(\lambda, \nu) &=\inf_{x \in \mathcal{D}} L(x, \lambda, \nu) \\ &=\inf_{x \in \mathcal{D}}\left(f_{0}(x)+\sum_{i=1}^{m} \lambda_{i} f_{i}(x)+\sum_{i=1}^{p} \nu_{i} h_{i}(x)\right) \end{split} $$$g$ is concave.
lower bound property: if $\lambda \succeq 0$, then $g(\lambda, \nu) \le p^{\star}$
Lagrange Dual Problem
$$ \begin{align*} &\text{maximize} & &g(\lambda, \nu) \\ &\text{subject to} & &\lambda \succeq 0 \end{align*} $$- find best lower bound on $p^\star$, obtained from Lagrange dual function
- a convex optimization problem; optimal value denoted $d^\star$
- $\lambda, \nu$ are dual feasible if $\lambda \succeq 0,\ (\lambda, \nu) \in \dom g$
- often simplified by making implicit constraint $(\lambda, \nu) \in \dom g$ explicit
Optimality Conditions
Weak duality $d^\star \le p^\star$ always holds, can be expressed as
$$ \sup_{\lambda \succeq 0} \inf_{x} L(x, \lambda) \le \inf_x \sup_{\lambda \succeq 0} L(x, \lambda) $$Strong duality $d^\star = p^\star$ usually holds for convex problems. It means that $x^\star$ and $\lambda^\star$ from a saddle-point for the Lagrangian.
Slater’s Constraint Qualification
Strong duality holds for a convex problem if it is strictly feasible, i.e.,
$$ \exists x \in \operatorname{int} \mathcal{D} : \quad f_{i}(x)<0, \ i=1, \dots, m, \quad A x=b $$KKT Conditions
If strong duality holds and $x^\star, \lambda^\star, \nu^\star$ are optimal, then they must satisfy:
primal constraints: $f_{i}(x^\star) \le 0,\ h_{i}(x^\star)=0$
dual constraints: $\lambda^\star \succeq 0$
complementary slackness: $\lambda_i^\star f_i(x^\star) = 0$
gradient of Lagrangian with respect to $x$ vanishes:
$$ \nabla f_{0}(x^\star)+\sum_{i=1}^{m} \lambda_{i}^\star \nabla f_{i}(x^\star)+\sum_{i=1}^{p} \nu_{i}^\star \nabla h_{i}(x^\star)=0 $$
If $\tilde{x}, \tilde{\lambda}, \tilde{\nu}$ satisfy KKT for a convex problem, then they are optimal
Applications
Geometric Problems
Minimum Volume Ellipsoid Around a Set
Minimum volume ellipsoid $\mathcal{E}$ of a set $C$.
parametrize $\mathcal{E}$ as $\mathcal{E}=\{v \mid \norm{A v+b}_{2} \le 1\}$, assume $A \in \mathbb{S}_{++}^n$
$\operatorname{vol} \mathcal{E}$ is proportional to $\det A^{-1}$, can compute $\mathcal{E}$ by solving
$$ \begin{align*} &\text{minimize}_{A,b} & &\log \det A^{-1}\\ &\text{subject to} & &\sup_{v \in C}\norm{A v+b}_{2} \le 1 \end{align*} $$
Maximum Volume Inscribed Ellipsoid
Maximum volume ellipsoid $\mathcal{E}$ inside a convex set $C \subseteq \mathbb{R}^n$
parametrize $\mathcal{E}$ as $\mathcal{E}=\{B u+d \mid \norm{u}_{2} \le 1\}$, assume $B \in \mathbb{S}_{++}^n$
$\operatorname{vol} \mathcal{E}$ is proportional to $\det B$, can compute $\mathcal{E}$ by solving
$$ \begin{align*} &\text{maximize} & &\log\det B \\ &\text{subject to} & &\sup_{\norm{u}_{2} \le 1} I_{C}(B u+d) \le 0 \end{align*} $$where $I_C(x) = 0$ for $x\in C$ and $I_C(x) = \infty$ for $x\notin C$
Linear Discrimination
Separate two sets of points $\{x_1, \dots, x_N\},\ \{y_1, \dots, y_M\}$ by a hyperplane:
$$ a^{T} x_{i}+b>0, \ i=1, \dots, N, \quad a^{T} y_{i}+b<0, \ i=1, \dots, M $$homogeneous in $a, b$, hence equivalent to
$$ a^{T} x_{i}+b \ge 1, \ i=1, \dots, N, \quad a^{T} y_{i}+b \le-1, \ i=1, \dots, M $$To separate two sets of points by maximum margin
$$ \begin{align*} &\text{minimize} & &(1 / 2)\norm{a}_{2} \\ &\text{subject to} & &a^{T} x_{i}+b \ge 1, \quad i=1, \dots, N \\ & & &a^{T} y_{i}+b \le-1, \quad i=1, \dots, M \end{align*} $$Support Vector Classifier
$$ \begin{align*} &\text{minimize} & &\norm{a}_{2}+\gamma\left(\mathbf{1}^{T} u+\mathbf{1}^{T} v\right) \\ &\text{subject to} & &a^{T} x_{i}+b \ge 1-u_{i}, \quad i=1, \dots, N \\ & & &a^{T} y_{i}+b \le -1+v_{i}, \quad i=1, \dots, M \\ & & &u \succeq 0, \quad v \succeq 0 \end{align*} $$produces point on trade-off curve between inverse of margin $2/\norm{a}_2$ and classification error, measured by total slack $\mathbf{1}^{T} u+\mathbf{1}^{T} v$
Data Fitting
Norm Approximation
$$ \text{minimize } \norm{A x-b} $$where $A \in \mathbb{R}^{m \times n}$ with $m\ge n$
Linear measurement model: $y = Ax + v$, $y$ are measurements, $x$ is unknown, $v$ is measurement error. Given $y=b$, best guess of $x$ is $x^\star$
Least-Norm Problems
$$ \begin{align*} &\text{minimize} & &\norm{x} \\ &\text{subject to} & &Ax=b \end{align*} $$where $A \in \mathbb{R}^{m \times n}$ with $m\le n$
Scalarized Problem
$$ \text{minimize } \norm{A x-b}+\gamma\norm{x} $$tradeoff between error and norm
Statistical Estimation
Maximum Likelihood Estimation
$$ \text{maximize }(\text{over } x ) \quad \log p_{x}(y) $$With linear measurement model with IID noise: $y_{i}=a_{i}^{T} x+v_{i}, \ i=1, \dots, m$, the estimation problem becomes
$$ \text{maximize } l(x) = \sum_{i=1}^{m} \log p(y_{i}-a_{i}^{T} x) $$where $y$ is observed value, $p$ is the PDF of the measurement noise $v$
Gaussian noise $\mathcal{N}(0, \sigma^{2}) : p(z)=(2 \pi \sigma^{2})^{-1 / 2} e^{-z^{2} /(2 \sigma^{2})}$
$$ l(x)=-\frac{m}{2} \log (2 \pi \sigma^{2}) - \frac{1}{2 \sigma^{2}} \sum_{i=1}^{m}(a_{i}^{T} x-y_{i})^{2} $$Laplacian noise $p(z)=(1 /(2 a)) e^{-\abs{z} / a}$
$$ l(x)=-m \log (2 a)-\frac{1}{a} \sum_{i=1}^{m}\abs*{a_{i}^{T} x-y_{i}} $$Uniform noise on $[-a, a]$
$$ l(x)=\begin{dcases} -m \log (2 a) & \abs*{a_{i}^{T} x-y_{i}} \le a, \ i=1, \ldots, m \\ -\infty & \text{otherwise} \end{dcases} $$
Logistic Regression
Random variable $y\in \{0,1\}$ with distribution
$$ p=\operatorname{prob}(y=1)=\frac{\exp (a^{T} u+b)}{1+\exp (a^{T} u+b)} $$log-likelihood function (for $y_1 = \dots = y_k = 1, \ y_{k+1} = \dots = y_m = 0$):
$$ \begin{split} l(a, b) &=\log \left(\prod_{i=1}^{k} \frac{\exp (a^{T} u_{i}+b)}{1+\exp (a^{T} u_{i}+b)} \prod_{i=k+1}^{m} \frac{1}{1+\exp (a^{T} u_{i}+b)}\right) \\ &=\sum_{i=1}^{k}(a^{T} u_{i}+b)-\sum_{i=1}^{m} \log \left(1+\exp (a^{T} u_{i}+b)\right) \end{split} $$concave in $a, b$