Convex Optimization

AA/EE/ME 578 Review


Convex Sets

Examples

Subspaces

\(S \subseteq \mathbf{R}^{n}\) is a subspace if \[ \displaylines{x, y \in S, \quad \lambda, \mu \in \mathbf{R} \quad \Longrightarrow \lambda x+\mu y \in S } \] Geometrically: \(x, y \in S \Rightarrow\) plane through \(0, x, y \subseteq S\).

Affine Sets

\(S \subseteq \mathbf{R}^n\) is affine if \[ \displaylines{x, y \in S, \quad \lambda, \mu \in \mathbf{R}, \quad \lambda+\mu=1 \Longrightarrow \lambda x+\mu y \in S } \] Geometrically: \(x, y \in S \Rightarrow\) line through \(x, y \subseteq S\).

Convex Sets

\(S \subseteq \mathbf{R}^n\) is a convex set if \[ \displaylines{x, y \in S, \quad \lambda, \mu \geq 0, \quad \lambda+\mu=1 \Longrightarrow \lambda x+\mu y \in S } \] Geometrically: \(x, y \in S \Rightarrow\) segment \([x, y] \subseteq S\).

Convex Cone

\(S \subseteq \mathbf{R}^n\) is a cone if \[ \displaylines{x \in S, \quad \lambda \geq 0 \Longrightarrow \lambda x \in S } \] \(S \subseteq \mathbf{R}^n\) is a convex cone if \[ \displaylines{x, y \in S, \quad \lambda, \mu \geq 0 \Longrightarrow \lambda x+\mu y \in S } \] Geometrically: \(x, y \in S \Rightarrow\) 'pie slice' between \(x, y \subseteq S\).

Combinations and Hulls

\(y=\theta_{1} x_{1}+\cdots+\theta_{k} x_{k}\) is a

  • linear combination of \(x_1, \dots, x_k\)
  • affine combination if \(\sum \theta_i = 1\)
  • convex combination if \(\sum \theta_i = 1,\ \theta_i \ge 0\)
  • conic combination if \(\theta_i \ge 0\).

(linear, ...) hull of \(S\) is the set of all (linear, ...) combinations from \(S\). \[ \displaylines{\operatorname{conv}(S)=\bigcap\{G \mid S \subseteq G, \ G \text { convex }\} } \]

convex hull \(\operatorname{conv}(S)\) is the set of all convex combinations of points in \(S\).

Hyperplanes and Halfspaces

  • Hyperplane: \(\left\{x \mid a^{T} x=b\right\}(a \neq 0)\)
  • Halfspace: \(\left\{x \mid a^{T} x \leq b\right\}(a \neq 0)\)

\(a\) is the normal vector. Hyperplanes are affine and convex; halfspaces are convex.

Euclidean Balls and Ellipsoids

Euclidean ball with center \(x_c\) and radius \(r\): \[ \displaylines{B\left(x_{c}, r\right)=\left\{x \mid \left\|x-x_{c}\right\|_{2} \leq r\right\}=\left\{x_{c}+r u \mid \|u\|_{2} \leq 1\right\} } \] Ellipsoid: \[ \displaylines{\left\{x \mid \left(x-x_{c}\right)^{T} P^{-1}\left(x-x_{c}\right) \leq 1\right\} } \] with \(P \in \mathbf{S}_{++}^{n}\). Other representation: \(\left\{x_{c}+A u \mid \|u\|_{2} \leq 1\right\}\) with \(A\) square and nonsingular.

Norm Balls and Norm Cones

Norm: a function \(\|\cdot\|\) that satisfies

  • \(\|x\| \geq 0 ; \ \|x\|=0\) if and only if \(x=0\)
  • \(\|t x\|=|t|\|x\|\) for \(t \in \mathbf{R}\)
  • \(\|x+y\| \leq\|x\|+\|y\|\)

Norm ball with center \(x_c\) and radius \(r\): \(\left\{x \mid \left\|x-x_{c}\right\| \leq r\right\}\)

Norm cone: \(\{(x, t) \mid \|x\| \leq t\}\)

Polyhedron

Solution set of finitely many linear inequalities and equalities \[ \displaylines{A x \preceq b, \qquad C x=d } \] (\(A \in \mathbf{R}^{m \times n}, \ C \in \mathbf{R}^{p \times n}, \ \preceq\) is component-wise inequality)

Polyhedron is intersection of finite number of halfspaces and hyperplanes.

Positive Semidefinite Cone

  • \(\mathbf{S}_{+}^{n}=\left\{X \in \mathbf{S}^{n} \mid X \succeq 0\right\}\): PSD matrices. A convex cone.
  • \(\mathbf{S}_{++}^{n}=\left\{X \in \mathbf{S}^{n} \mid X\succ 0\right\}\): PD matrices.

Operations That Preserve Convexity

To show \(C\) is convex set

  1. Definition \[ \displaylines{x_{1}, x_{2} \in C, \quad 0 \leq \theta \leq 1 \quad \Longrightarrow \quad \theta x_{1}+(1-\theta) x_{2} \in C } \]

  2. Show that \(C\) is obtained from simple convex sets by operations that preserve convexity

    • intersection
    • affine function
    • perspective function
    • linear-fractional functions

Intersection

The intersection of (any number of, even infinite) convex sets is convex.

Affine Function

Suppose \(f : \mathbf{R}^{n} \rightarrow \mathbf{R}^{m}\) is affine (\(f(x)=A x+b\) with \(A \in \mathbf{R}^{m \times n}, \ b \in \mathbf{R}^{m}\))

  • The image of a convex set under \(f\) is convex \[ \displaylines{S \subseteq \mathbf{R}^{n} \text { convex } \Longrightarrow f(S)=\{f(x) \mid x \in S\} } \]

  • The inverse image \(f^{-1}(C)\) of a convex set under \(f\) is convex \[ \displaylines{C \subseteq \mathbf{R}^{m} \text{ convex } \quad \Longrightarrow \quad f^{-1}(C)=\left\{x \in \mathbf{R}^{n} \mid f(x) \in C\right\} \text{ convex } } \]

Perspective and Linear-Fractional Function

Perspective function \(P : \mathbf{R}^{n+1} \rightarrow \mathbf{R}^{n}\) \[ \displaylines{P(x, t)=x / t, \quad \operatorname{dom} P=\{(x, t) \mid t>0\} } \] images and inverse images of convex sets under perspective are convex.

Linear-fractional function \(f : \mathbf{R}^{n} \rightarrow \mathbf{R}^{m}\) \[ \displaylines{f(x)=\frac{A x+b}{c^{T} x+d}, \qquad \text { dom } f=\left\{x \mid c^{T} x+d>0\right\} } \] images and inverse images of convex sets under linear-fractional functions are convex.

Generalized Inequalities

A convex cone \(K \subseteq \mathbf{R}^{n}\) is a proper cone if

  • \(K\) is closed (contains its boundary)
  • \(K\) is solid (has nonempty interior)
  • \(K\) is pointed (contains no line)

Generalized inequality defined by a proper cone \(K\): \[ \displaylines{x \preceq_{K} y \ \Longleftrightarrow \ y-x \in K, \qquad x\preceq_{K} y \ \Longleftrightarrow \ y-x \in \operatorname{int} K } \]

Hyperplane Theorem

Separating Hyperplane

If \(C\) and \(D\) are disjoint convex sets, then there exists \(a \ne 0, \ b\) such that \[ \displaylines{a^{T} x \leq b \ \text { for }\ x \in C, \qquad a^{T} x \geq b \ \text { for } \ x \in D } \] the hyperplane \(\left\{x \mid a^{T} x=b\right\}\) separates \(C\) and \(D\). Strict separation requires additional assumptions (e.g. \(C\) is closed, \(D\) is a singleton).

Supporting Hyperplane

Supporting hyperplane to set \(C\) at boundary point \(x_0\): \[ \displaylines{\left\{x \mid a^{T} x=a^{T} x_{0}\right\} } \] where \(a \ne 0\) and \(a^T x \le a^T x_0\) for all \(x \in C\).

Supporting hyperplane theorem: If \(C\) is convex, then there exists a supporting hyperplane at every boundary point of \(C\).

Dual Cones

Dual cone of a cone \(K\): \(K^*=\left\{y \mid y^{T} x \geq 0 \text { for all } x \in K\right\}\)

Examples:

  • \(K=\mathbf{R}_{+}^{n} : K^{*}=\mathbf{R}_{+}^{n}\)
  • \(K=\mathbf{S}_{+}^{n} : K^{*}=\mathbf{S}_{+}^{n}\)
  • \(K=\left\{(x, t) \mid\|x\|_{2} \leq t\right\} : K^{*}=\left\{(x, t) \mid\|x\|_{2} \leq t\right\}\)
  • \(K=\left\{(x, t) \mid\|x\|_{1} \leq t\right\} : K^{*}=\left\{(x, t) \mid\|x\|_{\infty} \leq t\right\}\)

Convex Functions

\(f : \mathbf{R}^{n} \rightarrow \mathbf{R}\) is convex if \(\operatorname{dom} f\) is a convex set and \[ \displaylines{f(\theta x+(1-\theta) y) \leq \theta f(x)+(1-\theta) f(y) } \] for all \(x, y \in \operatorname{dom} f, \ 0\le \theta \le 1\)

Examples

Convex

  • affine: \(a^T x + b\)
  • exponential: \(e^{ax}\), for any \(a \in \mathbf{R}\)
  • powers: \(x^\alpha\) on \(\mathbf{R}_{++}\), for \(\alpha \ge 1\) or \(a \le 0\)
  • powers of absolute value: \(|x|^p\) on \(\mathbf{R}\), for \(p \ge 1\)
  • negative entropy: \(x\log x\) on \(\mathbf{R}_{++}\)
  • norms: \(\|x\|_{p}=\left(\sum_{i=1}^{n}\left|x_{i}\right|^{p}\right)^{1 / p}\) for \(p \ge 1\)
  • affine on matrices: \(f(X)=\operatorname{tr}\left(A^{T} X\right)+b\)
  • spectral norm: \(f(X) = \|X\|_2 = \sigma_{\max}(X)\)
  • quadratic: \(f(x)=(1 / 2) x^{T} P x+q^{T} x+r\) with \(P \in \mathbf{S}^{n}\)
  • least-squares: \(f(x) = \|Ax - b\|_2^2\)
  • quadratic-over-linear: \(f(x, y) = x^2/y\) with \(y > 0\)
  • log-sum-exp: \(f(x)=\log \sum_{k=1}^{n} \exp x_{k}\)

Concave

  • \(f\) in concave if \(-f\) is convex
  • affine
  • powers: \(x^\alpha\) on \(\mathbf{R}_{++}\), for \(0\le \alpha \le 1\)
  • logarithm: \(\log x\) on \(\mathbf{R}_{++}\)
  • \(\log\det X\) on \(\mathbf{S}_{++}^n\)
  • geometric mean: \(f(x)=\left(\prod_{k=1}^{n} x_{k}\right)^{1 / n}\) on \(\mathbf{R}_{++}^n\)

Properties

Restriction of a Convex Function to a Line

\(f : \mathbf{R}^{n} \rightarrow \mathbf{R}\) is convex iff the function \(g : \mathbf{R}^{n} \rightarrow \mathbf{R}\) \[ \displaylines{g(t)=f(x+t v), \qquad \operatorname{dom} g=\{t \mid x+t v \in \operatorname{dom} f\} } \]

is convex (in \(t\)) for any \(x \in \operatorname{dom} f, \ v\in \mathbf{R}^n\)

First-Order Convexity Condition

Differentiable \(f\) with convex domain is convex iff \[ \displaylines{f(y) \geq f(x)+\nabla f(x)^{T}(y-x) \quad \text { for all } x, y \in \operatorname{dom} f } \]

Second-Order Convexity Condition

Twice differentiable \(f\) with convex domain is convex iff \[ \displaylines{\nabla^{2} f(x) \succeq 0 \quad \text { for all } x \in \operatorname{dom} f } \]

Epigraph and Sublevel Set

\(\alpha\)-sublevel set of \(f : \mathbf{R}^{n} \rightarrow \mathbf{R}\): \[ \displaylines{C_{\alpha}=\{x \in \operatorname{dom} f \mid f(x) \leq \alpha\} } \] sublevel sets of convex functions are convex (converse is false)

Epigraph of \(f : \mathbf{R}^{n} \rightarrow \mathbf{R}\): \[ \displaylines{\operatorname{epi}f=\left\{(x, t) \in \mathbf{R}^{n+1} \mid x \in \operatorname{dom} f, \ f(x) \leq t\right\} } \] \(f\) is convex iff \(\operatorname{epi}f\) is a convex set.

Jensen's Inequality

If \(f\) is convex, then for \(0 \le \theta \le 1\) \[ \displaylines{f(\theta x+(1-\theta) y) \leq \theta f(x)+(1-\theta) f(y) } \] can be extended to \[ \displaylines{f(\mathbf{E}\ z) \leq \mathbf{E}\ f(z) } \] for any random variable \(z\)

Operations That Preserve Convexity

To show \(f\) is convex function:

  1. Verify definition (often simplified by restricting to a line)
  2. For twice differentiable functions, show \(\nabla^2 f(x) \succeq 0\)
  3. Show that \(f\) is obtained from simple convex functions by operations that preserve convexity
  • Nonnegative multiple: \(\alpha f\) is convex if \(f\) is convex, \(\alpha \ge 0\)

  • Sum: \(f_1 + f_2\) is convex if \(f_1, f_2\) convex

  • Composition with affine function: \(f(Ax + b)\) is convex if \(f\) is convex

  • Pointwise maximum: if \(f_1, \dots, f_m\) are convex, then \(f(x)=\max \left\{f_{1}(x), \ldots, f_{m}(x)\right\}\) is convex

  • Pointwise supremum: if \(f(x, y)\) is convex in \(x\) for each \(y\in \mathcal{A}\), then \(g(x)=\sup _{y \in \mathcal{A}} f(x, y)\) is convex.

  • Composition of \(g : \mathbf{R}^{n} \rightarrow \mathbf{R}\) and \(h : \mathbf{R} \rightarrow \mathbf{R}\): \(f(x) = h(g(x))\) is convex if
    • \(g\) convex, \(h\) convex, \(\tilde{h}\) nondecreasing
    • \(g\) concave, \(h\) convex, \(\tilde{h}\) nonincreasing
  • Composition of \(g : \mathbf{R}^{n} \rightarrow \mathbf{R}^k\) and \(h : \mathbf{R}^k \rightarrow \mathbf{R}\): \(f(x) = h(g(x)) = h(g_1(x), \dots, g_k(x))\) is convex if
    • \(g_i\) convex, \(h\) convex, \(\tilde{h}\) nondecreasing in each argument
    • \(g_i\) concave, \(h\) convex, \(\tilde{h}\) nonincreasing in each argument
  • Minimization: if \(f(x, y)\) is convex in \((x, y)\) and \(C\) is convex set, then \(g(x)=\inf _{y \in C} f(x, y)\) is convex

  • Perspective if a function \(f : \mathbf{R}^{n} \rightarrow \mathbf{R}\) is the function \(g : \mathbf{R}^{n} \times \mathbf{R} \rightarrow \mathbf{R}\) \[ \displaylines{ g(x, t)=t f(x / t), \qquad \operatorname{dom} g=\{(x, t) \mid x / t \in \operatorname{dom} f, t>0\} } \] \(g\) is convex if \(f\) is convex

Conjugate Function

The conjugate of a function \(f\): \(f^{*}(y)=\sup _{x \in \operatorname{dom} f}\left(y^{T} x-f(x)\right)\) is always convex.

Quasiconvex Function

\(f : \mathbf{R}^{n} \rightarrow \mathbf{R}\) is quasiconvex if \(\operatorname{dom} f\) is convex and the sublevel sets \[ \displaylines{S_{\alpha}=\{x \in \operatorname{dom} f \mid f(x) \leq \alpha\} } \] are convex for all \(\alpha\)

\(f\) is quasiconcave if \(-f\) is quasiconvex.

  • Modified Jensen inequality: \(0 \leq \theta \leq 1 \ \Longrightarrow \ f(\theta x+(1-\theta) y) \leq \max \{f(x), f(y)\}\)
  • First-order condition: \(f(y) \leq f(x) \quad \Longrightarrow \quad \nabla f(x)^{T}(y-x) \leq 0\)
  • Sums of quasiconvex functions are not necessarily quasiconvex

Log-Concave and Log-Convex Function

A positive function \(f\) is log-concave if \(\log f\) is concave: \[ \displaylines{f(\theta x+(1-\theta) y) \geq f(x)^{\theta} f(y)^{1-\theta} \quad \text { for }\ 0 \leq \theta \leq 1 } \] Many common probability densities are log-concave, e.g., normal distribution.

  • Second-order condition: \(f(x) \nabla^{2} f(x) \preceq \nabla f(x) \nabla f(x)^{T}\)
  • Product of log-concave functions is log-concave
  • Sum of log-concave functions is not always log-concave
  • Integration: if \(f : \mathbf{R}^{n} \times \mathbf{R}^{m} \rightarrow \mathbf{R}\) is log-concave, then \(\displaystyle g(x)=\int f(x, y) dy\) is log-concave

Convex Optimization Problems

Optimization Problem

\[ \displaylines{\begin{array}{cl} \operatorname{minimize} & {f_{0}(x)} \\ \text { subject to } & {f_{i}(x) \leq 0, \quad i=1, \ldots, m} \\ & {h_{i}(x)=0, \quad i=1, \ldots, p}\end{array} } \]

  • \(x \in \mathbf{R}^n\) is the optimization variable
  • \(f_0: \mathbf{R}^n \rightarrow \mathbf{R}\) is the objective or cost function
  • \(f_0: \mathbf{R}^n \rightarrow \mathbf{R}, i = 1, \dots, m\) are the inequality constraint functions
  • \(h_i: \mathbf{R}^n \rightarrow \mathbf{R}\) are the equality constraint functions

Optimal value \(p^{\star}=\inf \left\{f_{0}(x) \mid f_{i}(x) \leq 0,\ i=1, \ldots, m,\ h_{i}(x)=0, \ i=1, \ldots, p\right\}\)

  • \(p^\star = \infty\) if problem is infeasible
  • \(p^\star = -\infty\) if problem is unbounded below

Feasibility Problem

\[ \displaylines{\begin{array}{cl}{\operatorname{minimize}} & {0} \\ {\text { subject to }} & {f_{i}(x) \leq 0, \quad i=1, \ldots, m} \\ {} & {h_{i}(x)=0, \quad i=1, \ldots, p}\end{array} } \]

  • \(p^\star = 0\) if constraints are feasible; any feasible \(x\) is optimal
  • \(p^\star = \infty\) if constraints are infeasible

Convex Optimization

\[ \displaylines{\begin{array}{ll}{\operatorname{minimize}} & {f_{0}(x)} \\ {\text { subject to }} & {f_{i}(x) \leq 0, \quad i=1, \ldots, m} \\ {} & Ax = b\end{array} } \]

  • \(f_0, \dots, f_m\) are convex; equality constraints are affine
  • Feasible set of a convex optimization problem is convex
  • Any locally optimal point of a convex problem is globally optimal

Optimality Criterion

\(x\) is optimal iff it is feasible and \(\nabla f_{0}(x)^{T}(y-x) \geq 0\) for all feasible \(y\). If nonzero, \(\nabla f_0(x)\) defines a supporting hyperplane to feasible set \(X\) at \(x\).

  • unconstrained problem: \[ \displaylines{x \in \operatorname{dom} f_{0}, \quad \nabla f_{0}(x)=0 } \]

  • equality constrained problem: \[ \displaylines{\operatorname{minimize}\ f_{0}(x) \quad \operatorname{subject to} \ A x=b } \] x is optimal iff there exists a \(\nu\) such that \[ \displaylines{x \in \operatorname{dom} f_{0}, \quad A x=b, \quad \nabla f_{0}(x)+A^{T} \nu=0 } \]

  • minimization over nonnegative orthant: \[ \displaylines{\operatorname{minimize}\ f_{0}(x) \quad \operatorname{subject to} \ x \succeq 0 } \] x is optimal iff \[ \displaylines{x \in \operatorname{dom} f_{0}, \qquad x \succeq 0, \qquad\left\{\begin{array}{ll}{\nabla f_{0}(x)_{i} \geq 0} & {x_{i}=0} \\ {\nabla f_{0}(x)_{i}=0} & {x_{i}>0}\end{array}\right. } \]

Equivalent Convex Problems

  • eliminating equality constrains \[ \displaylines{\begin{array}{ll}{\operatorname{minimize}\ (\text {over } z)} & {f_{0}\left(F z+x_{0}\right)} \\ {\text {subject to }} & {f_{i}\left(F z+x_{0}\right) \leq 0, \quad i=1, \ldots, m}\end{array} } \] where \(F\) and \(x_0\) such that \(A x=b \ \Longleftrightarrow \ x=F z+x_{0}\) for some \(z\)

  • introducing slack variables for linear inequalities \[ \displaylines{\begin{array}{ll}{\operatorname{minimize}} & {f_{0}(x)} \\ {\text { subject to }} & {a_{i}^{T} x \leq b_{i}, \quad i=1, \ldots, m}\end{array} } \] is equivalent to \[ \displaylines{\begin{array}{ll}{\operatorname{minimize}\ (\text{over } x, s)} & {f_{0}(x)} \\ {\text {subject to }} & {a_{i}^{T} x+s_{i}=b_{i}, \quad i=1, \ldots, m} \\ {} & {s_{i} \geq 0, \quad i=1, \ldots m}\end{array} } \]

  • epigraph form: standard form convex problem is equivalent to \[ \displaylines{\begin{array}{ll}{\operatorname{minimize}\ (\text {over } x, t)} & {t} \\ {\text {subject to }} & {f_{0}(x)-t \leq 0} \\ {} & {f_{i}(x) \leq 0, \quad i=1, \ldots, m} \\ {} & {A x=b}\end{array} } \]

Quasiconvex Optimization

\[ \displaylines{\begin{array}{cl}{\operatorname{minimize}} & {f_{0}(x)} \\ {\text { subject to }} & {f_{i}(x) \leq 0, \quad i=1, \ldots, m} \\ {} & {A x=b}\end{array} } \]

with \(f_0: \mathbf{R}^n \rightarrow \mathbf{R}\) quasiconvex, \(f_1, \dots, f_m\) convex. Can have locally optimal points that are not globally optimal

If \(f_0\) is quasiconvex, there exists a family of functions \(\phi_t\) such that:

  • \(\phi_t(x)\) is convex in \(x\) for fixed \(t\)
  • \(t\)-sublevel set of \(f_0\) is \(0\)-sublevel set of \(\phi_t\)

For a fixed \(t\), the quasiconvex optimization problem can be transferred to a convex feasibility problem in \(x\). Bisection method can be used to find the optimal \(t\).

Linear Optimization

Linear Programming

\[ \displaylines{\begin{array}{cl}{\text { minimize }} & {c^{T} x+d} \\ {\text { subject to }} & {G x \preceq h} \\ {} & {A x=b}\end{array} } \]

  • feasible set is a polyhedron.

Linear-Fractional Programming

\[ \displaylines{\begin{array}{cl}{\text { minimize }} & {f_{0}(x)} \\ {\text { subject to }} & {G x \preceq h} \\ {} & {A x=b}\end{array} } \]

where \[ \displaylines{f_{0}(x)=\frac{c^{T} x+d}{e^{T} x+f}, \qquad \operatorname{dom} f_{0}(x)=\left\{x \mid e^{T} x+f>0\right\} } \]

  • a quasiconvex optimization problem

  • equivalent to the LP \[ \displaylines{\begin{array}{cl}{\text { minimize }} & {c^{T} y+d z} \\ {\text { subject to }} & {G y \preceq h z} \\ {} & {A y=b z} \\ {} & {e^{T} y+f z=1} \\ {} & {z \geq 0}\end{array} } \]

Quadratic Optimization

Quadratic Programming

\[ \displaylines{\begin{array}{cl}{\operatorname{minimize}} & {(1 / 2) x^{T} P x+q^{T} x+r} \\ {\text { subject to }} & {G x \preceq h} \\ {} & {A x=b}\end{array} } \]

  • \(P \in \mathbf{S}_+^n\), so objective is convex quadratic
  • minimize a convex quadratic function over a polyhedron

Quadratically Constrained Quadratic Programming

\[ \displaylines{\begin{array}{ll}{\operatorname{minimize}} & {(1 / 2) x^{T} P_{0} x+q_{0}^{T} x+r_{0}} \\ {\text { subject to }} & {(1 / 2) x^{T} P_{i} x+q_{i}^{T} x+r_{i} \leq 0, \quad i=1, \ldots, m} \\ {} & {A x=b}\end{array} } \]

  • \(P_i \in \mathbf{S}_+^n\), objective and constraints are convex quadratic
  • if \(P_1, \dots, P_m \in \mathbf{S}_{++}^n\), feasible region is intersection of \(m\) ellipsoids and an affine set

Second-Order Cone Programming

\[ \displaylines{\begin{array}{cl}{\operatorname{minimize}} & {f^{T} x} \\ {\text { subject to }} & {\left\|A_{i} x+b_{i}\right\|_{2} \leq c_{i}^{T} x+d_{i}, \quad i=1, \ldots, m} \\ {} & {F x=g}\end{array} } \]

  • \(A_{i} \in \mathbf{R}^{n_{i} \times n}, F \in \mathbf{R}^{p \times n}\)
  • inequalities are called second-order cone constraints
  • for \(n_i = 0\), reduces to an LP; if \(c_i = 0\), reduces to a QCQP

Geometric Programming

  • Monomial function \[ \displaylines{f(x)=c x_{1}^{a_{1}} x_{2}^{a_{2}} \cdots x_{n}^{a_{n}}, \qquad \text { dom } f=\mathbf{R}_{++}^{n} } \] with \(c>0,\ \alpha_i \in \mathbf{R}\)

  • Posynomial function: sum of monomials \[ \displaylines{f(x)=\sum_{k=1}^{K} c_{k} x_{1}^{a_{1 k}} x_{2}^{a_{2 k}} \cdots x_{n}^{a_{n k}}, \qquad \operatorname{dom} f=\mathbf{R}_{++}^{n} } \]

  • Geometric program \[ \displaylines{\begin{array}{ll}{\text { minimize }} & {f_{0}(x)} \\ {\text { subject to }} & {f_{i}(x) \leq 1, \quad i=1, \ldots, m} \\ {} & {h_{i}(x)=1, \quad i=1, \ldots, p}\end{array} } \] with \(f_i\) posynomial, \(h_i\) monomial

Geometric program in convex form: \[ \displaylines{\begin{array}{cl}{\operatorname{minimize}} & {\log \left(\sum_{k=1}^{K} \exp \left(a_{0 k}^{T} y+b_{0 k}\right)\right)} \\ {\text { subject to }} & {\log \left(\sum_{k=1}^{K} \exp \left(a_{i k}^{T} y+b_{i k}\right)\right) \leq 0, \quad i=1, \ldots, m} \\ {} & {G y+d=0}\end{array} } \]

Generalized Inequality Constraints

\[ \displaylines{\begin{array}{ll}{\operatorname{minimize}} & {f_{0}(x)} \\ {\text { subject to }} & {f_{i}(x) \preceq_{K_{i}} 0, \quad i=1, \ldots, m} \\ {} & {A x=b}\end{array} } \]

  • \(f_0: \mathbf{R}^n \rightarrow \mathbf{R}\) convex; \(f_i: \mathbf{R}^n \rightarrow \mathbf{R}^{k_i}\) \(K_i\)-convex w.r.t. proper cone \(K_i\)
  • same properties as standard convex problem (convex feasible set, local optimum is global, etc.)

Conic form problem: \[ \displaylines{\begin{array}{cl} \operatorname{minimize} & c^{T} x \\ \text{subject to} & F x+g \preceq_{K} 0 \\ & A x=b \end{array} } \] extends LP (\(K = \mathbf{R}_+^m\)) to non-polyhedral cones

Semidefinite programming: \[ \displaylines{\begin{array}{cl} {\operatorname{minimize}} & {c^{T} x} \\ {\text { subject to }} & {x_{1} F_{1}+x_{2} F_{2}+\cdots+x_{n} F_{n}+G \preceq 0} \\ {} & {A x=b} \end{array} } \] with \(F_i, G \in \mathbf{S}^k\)

Minimum and Minimal Elements

\(\preceq_K\) is not in general a linear ordering: we can have \(x \npreceq_K y\) and \(y \npreceq_K x\)

  • \(x \in S\) is the minimum element of \(S\) w.r.t. \(\preceq_K\) if \(y \in S \ \Longrightarrow \ x \preceq_K y\)
  • \(x \in S\) is the minimal element of \(S\) w.r.t. \(\preceq_K\) if \(y \in S, \ y \preceq_{K} x \ \Longrightarrow \ y=x\)

Vector Optimization

General vector optimization problem: \[ \displaylines{\begin{array}{ll} \text{minimize (w.r.t. } K ) & f_{0}(x) \\ \text{subject to} & f_{i}(x) \leq 0, \quad i=1, \ldots, m \\ & h_{i}(x) \leq 0, \quad i=1, \ldots, p \end{array} } \] vector objective \(f_{0} : \mathbf{R}^{n} \rightarrow \mathbf{R}^{q}\), minimized w.r.t. proper cone \(K \in \mathbf{R}^{q}\)

Convex vector optimization problem: \[ \displaylines{\begin{array}{ll} \text{minimize (w.r.t. } K ) & f_{0}(x) \\ \text{subject to} & f_{i}(x) \leq 0, \quad i=1, \ldots, m \\ & A x=b \end{array} } \]

with \(f_0\) \(K\)-convex, \(f_1,\dots, f_m\) convex

Optimal and Pareto Optimal Points

Set of achievable objective values \(\mathcal{O}=\left\{f_{0}(x) \mid x\ \text { feasible}\right\}\)

  • feasible \(x\) is optimal if \(f_0(x)\) is a minimum value of \(\mathcal{O}\)
  • feasible \(x\) is Pareto optimal if \(f_0(x)\) is a minimal value of \(\mathcal{O}\)

Duality

Lagrange Dual Function

From the standard form optimization problem we define the Lagrangian \(L : \mathbf{R}^{n} \times \mathbf{R}^{m} \times \mathbf{R}^{p} \rightarrow \mathbf{R}\) \[ \displaylines{L(x, \lambda, \nu)=f_{0}(x)+\sum_{i=1}^{m} \lambda_{i} f_{i}(x)+\sum_{i=1}^{p} \nu_{i} h_{i}(x) } \]

  • weighted sum of objective and constraint functions
  • \(\lambda_i\) is Lagrange multiplier associated with \(f_i(x) \le 0\)
  • \(\nu_i\) is Lagrange multiplier associated with \(h_i(x) = 0\)

Lagrange dual function \(g : \mathbf{R}^{m} \times \mathbf{R}^{p} \rightarrow \mathbf{R}\) \[ \displaylines{\begin{aligned} g(\lambda, \nu) &=\inf_{x \in \mathcal{D}} L(x, \lambda, \nu) \\ &=\inf_{x \in \mathcal{D}}\left(f_{0}(x)+\sum_{i=1}^{m} \lambda_{i} f_{i}(x)+\sum_{i=1}^{p} \nu_{i} h_{i}(x)\right) \end{aligned} } \] \(g\) is concave.

lower bound property: if \(\lambda \succeq 0\), then \(g(\lambda, \nu) \leq p^{\star}\)

Lagrange Dual Problem

\[ \displaylines{\begin{array}{ll}{\text { maximize }} & {g(\lambda, \nu)} \\ {\text { subject to }} & {\lambda \succeq 0}\end{array} } \]

  • find best lower bound on \(p^\star\), obtained from Lagrange dual function
  • a convex optimization problem; optimal value denoted \(d^\star\)
  • \(\lambda, \nu\) are dual feasible if \(\lambda \succeq 0, (\lambda, \nu) \in \operatorname{dom} g\)
  • often simplified by making implicit constraint \((\lambda, \nu) \in \operatorname{dom} g\) explicit

Optimality Conditions

  • Weak duality \(d^\star \le p^\star\) always holds, can be expressed as \[ \displaylines{\sup_{\lambda \succeq 0} \inf_{x} L(x, \lambda) \leq \inf_x \sup_{\lambda \succeq 0} L(x, \lambda) } \]

  • Strong duality \(d^\star = p^\star\) usually holds for convex problems. It means that \(x^\star\) and \(\lambda^\star\) from a saddle-point for the Lagrangian.

Slater's Constraint Qualification

Strong duality holds for a convex problem if it is strictly feasible, i.e., \[ \displaylines{\exists x \in \operatorname{int} \mathcal{D} : \qquad f_{i}(x)<0, \quad i=1, \ldots, m, \qquad A x=b } \]

KKT Conditions

If strong duality holds and \(x^\star, \lambda^\star, \nu^\star\) are optimal, then they must satisfy:

  1. primal constraints: \(f_{i}(x^\star) \leq 0,\ h_{i}(x^\star)=0\)

  2. dual constraints: \(\lambda^\star \succeq 0\)

  3. complementary slackness: \(\lambda_i^\star f_i(x^\star) = 0\)

  4. gradient of Lagrangian with respect to \(x\) vanishes: \[ \displaylines{\nabla f_{0}(x^\star)+\sum_{i=1}^{m} \lambda_{i}^\star \nabla f_{i}(x^\star)+\sum_{i=1}^{p} \nu_{i}^\star \nabla h_{i}(x^\star)=0 } \]

If \(\tilde{x}, \tilde{\lambda}, \tilde{\nu}\) satisfy KKT for a convex problem, then they are optimal

Applications

Geometric Problems

Minimum Volume Ellipsoid Around a Set

Minimum volume ellipsoid \(\mathcal{E}\) of a set \(C\).

  • parametrize \(\mathcal{E}\) as \(\mathcal{E}=\left\{v \mid\|A v+b\|_{2} \leq 1\right\}\), assume \(A \in \mathbf{S}_{++}^n\)

  • \(\operatorname{vol} \mathcal{E}\) is proportional to \(\det A^{-1}\), can compute \(\mathcal{E}\) by solving \[ \displaylines{\begin{array}{ll}{\operatorname{minimize}}\ (\text {over } A, b) & {\log \operatorname{det} A^{-1}} \\ {\text {subject to }} & {\sup _{v \in C}\|A v+b\|_{2} \leq 1}\end{array} } \]

Maximum Volume Inscribed Ellipsoid

Maximum volume ellipsoid \(\mathcal{E}\) inside a convex set \(C \subseteq \mathbf{R}^n\)

  • parametrize \(\mathcal{E}\) as \(\mathcal{E}=\left\{B u+d \mid\|u\|_{2} \leq 1\right\}\), assume \(B \in \mathbf{S}_{++}^n\)

  • \(\operatorname{vol} \mathcal{E}\) is proportional to \(\det B\), can compute \(\mathcal{E}\) by solving \[ \displaylines{\begin{array}{ll}{\text { maximize }} & {\log \operatorname{det} B} \\ {\text { subject to }} & {\sup _{\|u\|_{2} \leq 1} I_{C}(B u+d) \leq 0}\end{array} } \] where \(I_C(x) = 0\) for \(x\in C\) and \(I_C(x) = \infty\) for \(x\notin C\)

Linear Discrimination

Separate two sets of points \(\{x_1, \dots, x_N\},\ \{y_1, \dots, y_M\}\) by a hyperplane: \[ \displaylines{a^{T} x_{i}+b>0, \ i=1, \ldots, N, \qquad a^{T} y_{i}+b<0, \ i=1, \ldots, M } \] homogeneous in \(a, b\), hence equivalent to \[ \displaylines{a^{T} x_{i}+b \geq 1, \quad i=1, \ldots, N, \qquad a^{T} y_{i}+b \leq-1, \quad i=1, \ldots, M } \] To separate two sets of points by maximum margin \[ \displaylines{\begin{array}{cl}{\operatorname{minimize}} & {(1 / 2)\|a\|_{2}} \\ {\text { subject to }} & {a^{T} x_{i}+b \geq 1, \quad i=1, \ldots, N} \\ {} & {a^{T} y_{i}+b \leq-1, \quad i=1, \ldots, M}\end{array} } \]

Support Vector Classifier

\[ \displaylines{\begin{array}{cl}{\text { minimize }} & {\|a\|_{2}+\gamma\left(\mathbf{1}^{T} u+\mathbf{1}^{T} v\right)} \\ {\text { subject to }} & {a^{T} x_{i}+b \geq 1-u_{i}, \quad i=1, \ldots, N} \\ {} & {a^{T} y_{i}+b \leq-1+v_{i}, \quad i=1, \ldots, M} \\ {} & {u \succeq 0, \quad v \succeq 0}\end{array} } \]

produces point on trade-off curve between inverse of margin \(2/\|a\|_2\) and classification error, measured by total slack \(\mathbf{1}^{T} u+\mathbf{1}^{T} v\)

Data Fitting

Norm Approximation

\[ \displaylines{\operatorname{minimize} \|A x-b\| } \]

where \(A \in \mathbf{R}^{m \times n}\) with \(m\ge n\)

Linear measurement model: \(y = Ax + v\), \(y\) are measurements, \(x\) is unknown, \(v\) is measurement error. Given \(y=b\), best guess of \(x\) is \(x^\star\)

Least-Norm Problems

\[ \displaylines{\begin{array}{ll}{\text { minimize }} & {\|x\|} \\ {\text { subject to }} & {A x=b}\end{array} } \]

where \(A \in \mathbf{R}^{m \times n}\) with \(m\le n\)

Scalarized Problem

\[ \displaylines{\operatorname{minimize} \|A x-b\|+\gamma\|x\| } \]

tradeoff between error and norm

Statistical Estimation

Maximum Likelihood Estimation

\[ \displaylines{\operatorname{maximize}\ (\text{over } x ) \quad \log p_{x}(y) } \] With linear measurement model with IID noise: \(y_{i}=a_{i}^{T} x+v_{i}, \ i=1, \ldots, m\), the estimation problem becomes \[ \displaylines{\operatorname{maximize}\ l(x) = \sum_{i=1}^{m} \log p\left(y_{i}-a_{i}^{T} x\right) } \] where \(y\) is observed value, \(p\) is the PDF of the measurement noise \(v\)

  • Gaussian noise \(\mathcal{N}\left(0, \sigma^{2}\right) : p(z)=\left(2 \pi \sigma^{2}\right)^{-1 / 2} e^{-z^{2} /\left(2 \sigma^{2}\right)}\) \[ \displaylines{l(x)=-\frac{m}{2} \log \left(2 \pi \sigma^{2}\right)-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{m}\left(a_{i}^{T} x-y_{i}\right)^{2} } \]

  • Laplacian noise \(p(z)=(1 /(2 a)) e^{-|z| / a}\) \[ \displaylines{l(x)=-m \log (2 a)-\frac{1}{a} \sum_{i=1}^{m}\left|a_{i}^{T} x-y_{i}\right| } \]

  • Uniform noise on \([-a, a]\) \[ \displaylines{l(x)=\left\{\begin{array}{ll}{-m \log (2 a)} & {\left|a_{i}^{T} x-y_{i}\right| \leq a, \quad i=1, \ldots, m} \\ {-\infty} & {\text { otherwise }}\end{array}\right. } \]

Logistic Regression

Random variable \(y\in \{0,1\}\) with distribution \[ \displaylines{p=\operatorname{prob}(y=1)=\frac{\exp \left(a^{T} u+b\right)}{1+\exp \left(a^{T} u+b\right)} } \] log-likelihood function (for \(y_1 = \cdots = y_k = 1, \ y_{k+1} = \cdots = y_m = 0\)): \[ \displaylines{\begin{aligned} l(a, b) &=\log \left(\prod_{i=1}^{k} \frac{\exp \left(a^{T} u_{i}+b\right)}{1+\exp \left(a^{T} u_{i}+b\right)} \prod_{i=k+1}^{m} \frac{1}{1+\exp \left(a^{T} u_{i}+b\right)}\right) \\ &=\sum_{i=1}^{k}\left(a^{T} u_{i}+b\right)-\sum_{i=1}^{m} \log \left(1+\exp \left(a^{T} u_{i}+b\right)\right) \end{aligned} } \] concave in \(a, b\)