From ME 510 & AMATH 584
A subspace of a vector space is a subset of a vector space which is itself a vector space.
QR Decomposition
$$ \require{mathtools} \DeclarePairedDelimiters\norm{\lVert}{\rVert} A = \begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ 0 \end{bmatrix} = Q_1R_1 $$where $A \in \mathbb{R}^{n \times k}$, $(n \ge k)$, $[Q_1, Q_2]$ is orthogonal, $Q_1 \in \mathbb{R}^{n \times r}$, $Q_2 \in \mathbb{R}^{n \times (n-r)}$, $R_1 \in \mathbb{R}^{r \times r}$, $\operatorname{rank}(A) = r$
$Q_1 R_1$ is called the reduced QR factorization
$Q^T Q = I_r$, and $R_1$ is upper triangular & invertible
usually computed using a variation on Gram-Schmidt procedure which is less sensitive to numerical (rounding) errors
column of $Q_1$ are orthonormal basis for $\mathcal{R}(A)$, $\mathcal{R}(Q_1) = \mathcal{R}(A)$
column of $Q_2$ are orthonormal basis for $\mathcal{N}(A^T)$, $\mathcal{R}(Q_2) = \mathcal{N}(A^T)$
Least-Squares
Consider $y = Ax$ where $A \in \mathbb{R}^{m \times n}$, $m>n$. To solve this equation approximately, we first define the residual $r = Ax-y$ and trying to find $x = x_{\mathrm{ls}}$ that minimizes $\norm{r}$
$$ \norm{r}^2 = x^TA^TAx - 2y^TAx + y^Ty $$set gradient to zero
$$ \nabla_x \norm{r}^2 = 2A^TAx - 2A^Ty = 0 $$we get
$$ x_\mathrm{ls} = (A^TA)^{-1}A^Ty $$$A^\dagger = (A^TA)^{-1} A^T$ is a left inverse of $A$
$Ax_\mathrm{ls}$ is the point in $\mathcal{R}(A)$ that is closest to $y$, i.e. it is the projection of $y$ onto $\mathcal{R}(A)$: $Ax_\mathrm{ls} = \mathcal{P}_{\mathcal{R}(A)}(y) = A(A^TA)^{-1}A^Ty$
$A(A^TA)^{-1}A^T$ gives projection onto $\mathcal{R}(A)$
optimal residual $r = Ax_\mathrm{ls} - y$ is orthogonal to $\mathcal{R}(A)$: $\langle r, Az \rangle = 0$ for all $z \in \mathbb{R}^n$
Least-Squares via $QR$ Factorization
$$ \norm{Ax - y}^2 = \norm*{\begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ 0 \end{bmatrix} x - y}^2 $$since multiplication by orthogonal matrix does not change norm, so
$$ \begin{split} \norm{Ax - y}^2 &= \norm*{\begin{bmatrix} Q_1 & Q_2 \end{bmatrix}^T \begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ 0 \end{bmatrix} x - \begin{bmatrix} Q_1 & Q_2 \end{bmatrix}^T y}^2 \\ &=\norm*{\begin{bmatrix} R_1 x - Q_1^T y \\ -Q_2^T y \end{bmatrix}}^2 \\ &=\norm{R_1 x - Q_1^T y}^2 + \norm{Q_2^T y}^2 \end{split} $$this is minimized by choosing $x_\mathrm{ls} = R_1^{-1} Q_1^T y$
- residual with optimal $x$ is $Ax_\mathrm{ls} - y = -Q_2 Q_2^T y$
- $Q_1 Q_1^T$ gives projection onto $\mathcal{R}(A)$
- $Q_2 Q_2^T$ gives projection onto $\mathcal{R}(A)^\perp$
Gauss-Newton Method for Nonlinear Least-Squares (NLLS) Problem
Find $x \in \mathbb{R}^n$ that minimizes
$$ \norm{r(x)}^2 = \sum_{i = 1}^m r_i(x)^2 $$where $r:\mathbb{R}^n \to \mathbb{R}^m$
Linearize $r$ near current iterate $x^{(k)}$
$$ r(x) \approx r(x^k) + J(x^{(k)})(x-x^{(k)}) $$where $J$ is the Jacobian matrix
At $k$-th iteration, we approximate NLLS problem by linear LS problem:
$$ \norm{r(x)}^2 \approx \norm*{A^{(k)}x - b^{(k)}}^2 $$where $A^{(k)} = J(x^{(k)})$, $b^{(k)} = J(x^{(k)}) x^{(k)} - r(x^{(k)})$
At $k+1$ iteration
$$ x^{(k+1)} = \left( A^{(k)T} A^{(k)} \right)^{-1} A^{(k)T} b^{(k)} $$Underdetermined Linear Equations
Consider
$$ y = Ax $$where $A \in \mathbb{R}^{m \times n}$, $(m < n)$. Assume $\operatorname{rank}(A) = m$, the solution has the form
$$ \{ x \mid Ax = y \} = \{ x_p + z \mid z\in \mathcal{N}(A) \} $$where $x_p$ is any solution that satisfies $A x_p = y$
solution has $\dim \mathcal{N}(A) = n - m$ degrees of freedom
the least-norm solution is $x_\mathrm{ln} = A^T (A A^T)^{-1} y$
$$ \begin{align*} &\min & &\norm{x} \\ &\text{ s.t.} & &Ax = y \end{align*} $$$A^\dagger = A^T (A A^T)^{-1}$ is a right inverse of $A$
$I - A^T (A A^T)^{-1} A$ gives projection onto $\mathcal{N}(A)$
apply QR factorization of $A^T = QR$ can get the same result $x_\mathrm{ln} = QR^{-T} y$
derivation via Lagrange multipliers: $L(x, \lambda) = x^T x + \lambda^T (A x - y)$
$$ \nabla_x L = 2 x + A^T \lambda \, , \quad \nabla_\lambda L = A x - y = 0 $$
General Norm Minimization With Equality Constraints
$$ \begin{align*} &\min & &\norm{ A x - b } \\ &\text{ s.t.} & &C x = d \end{align*} $$Lagrangian is
$$ \begin{split} L(x, \lambda) &= \frac{1}{2} \norm{Ax - b}^2 + \lambda^T (Cx -d) \\ &= \frac{1}{2} x^T A^T Ax - b^T Ax +\frac{1}{2} b^T b + \lambda^T Cx - \lambda^T d \end{split} $$optimality conditions are
$$ \nabla_x L = A^T Ax - A^T b + C^T \lambda = 0\, , \quad \nabla_\lambda L = Cx - d = 0 $$write in block matrix form as
$$ \begin{bmatrix} A^T A & C^T \\ C & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} A^T b \\ d \end{bmatrix} $$if the block matrix is invertible, we can solve for $x$ and $\lambda$.
If $A^T A$ is invertible, we can derive a more explicit formula for $x$
$$ x = (A^T A)^{-1} \Big(A^T b - C^T \big(C (A^T A)^{-1} C^T\big)^{-1} \big(C (A^T A)^{-1} A^T b\big)\Big) $$Householder Triangularization
The other principal method for computing $QR$ factorizations (besides Gram-Schmidt orthogonalization).
Gram-Schmidt:
$$ A R_1 R_2 \cdots R_n = \hat{Q} $$Householder:
$$ Q_n \cdots Q_2 Q_1 A = R $$In general, $Q_k$ operates on row $k,\dots,m$. At the beginning of step $k$, there is a block of zeros in the first $k-1$ columns of these rows. The application of $Q_k$ forms linear combinations of these rows. After $n$ steps, all the entries below the diagonal have been eliminated and $Q_n \cdots Q_2 Q_1 A = R$ is upper-triangular.
Each $Q$ is chosen to be a unitary matrix of the form
$$ Q_k = \begin{bmatrix} I & 0 \\ 0 & F \end{bmatrix} $$where $I_k$ is the $(k-1) \times (k-1)$ identity and $F$ is an $(m-k+1) \times (m-k+1)$ unitary matrix. Multiplication by $F$ must introduce zeros into the $k$th column. The Householder algorithm chooses $F$ to be a particular matrix called a Householder reflector.
$$ Fx = \begin{bmatrix} \norm{x} \\ 0 \\ \vdots \\ 0 \end{bmatrix} =\norm{x} e_1 $$$F$ will reflect the space across the hyperplane $H$ orthogonal to $v = \norm{x} e_1 -x$. Note that
$$ I - \frac{vv^*}{v*v} $$projects onto the space $H$. To reflect $y$ across $H$, we have
$$ F = I - 2\frac{vv^*}{v*v} $$Due to stability issues, in practice we use $v = \operatorname{sign}(x_1) \norm{x} e_1 + x$, i.e. always reflect $x$ to the vector $\pm \norm{x} e_1$ that is not too close to $x$ itself. $\operatorname{sign}(x_1) = 0$ if $x_1 = 0$.
Conditioning and Condition Numbers
A well-conditioned problem is one with the property that all small perturbations of $x$ lead to only small changes in $f(x)$.
- Absolute Condition Number: $$ \hat{\kappa} = \lim_{\delta \to 0} \sup_{\norm{\delta x} \le \delta} \frac{\norm{\delta f}}{\norm{\delta x}} $$
If $f$ is differentiable, then $\hat{\kappa} = \norm{J(x)}$, where $J(x)$ is the Jacobi matrix.
- Relative Condition Number: $$ \kappa = \lim_{\delta \to 0} \sup_{\norm{\delta x} \le \delta} \left( \frac{\norm{\delta f}}{\norm{f(x)}} \bigg/ \frac{\norm{\delta x}}{\norm{x}} \right) $$ If $f$ is differentiable, then $$ \kappa = \frac{\norm{J(x)}}{\norm{f(x)} / \norm{x}} $$
Relative condition numbers are more important in numerical analysis because the floating point arithmetic used by computers introduces relative errors rather than absolute ones.
- Condition Number of a matrix $$ \begin{split} \kappa &= \sup_{\delta x} \left( \frac{\norm{A(x + \delta x) - Ax}}{\norm{Ax}} \bigg/ \frac{\norm{\delta x}}{\norm{x}} \right) \\ &=\sup_{\delta x} \frac{\norm{A \delta x}}{\norm{\delta x}} \bigg/ \frac{\norm{Ax}}{\norm{x}} \\ &= \norm{A} \frac{\norm{x}}{\norm{Ax}} \\ &\le \norm{A} \norm{A^{-1}} \end{split} $$
If $\norm{\cdot} = \norm{\cdot}_2$, then $\norm{A} = \sigma_1$ and $\norm{A^{-1}} = 1/\sigma_m$. Thus
$$ \kappa(A) = \frac{\sigma_1}{\sigma_m} $$The ratio can be interpreted as the eccentricity of the hyperellipse.
When $A$ is a rectangular matrix, we can use the pseudoinverse: $\kappa(A) = \norm{A} \norm{A^\dagger}$
Cholesky Factorization
Hermitian matrix is a kind of matrix that satisfies $A = A^*$.
$$ \begin{split} A &= \begin{bmatrix} a_{11} & w^* \\ w & K \end{bmatrix} \\ &= \begin{bmatrix} \alpha & 0 \\ w/\alpha & I \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & K-ww^*/a_{11} \end{bmatrix} \begin{bmatrix} \alpha & w^*/\alpha \\ 0 & I \end{bmatrix} \\ &= R_1^*A_1 R_1 \end{split} $$where $\alpha = \sqrt{a_{11}}$. Continue this process until
$$ A = \underbrace{R_1^* R_2^*\cdots R_m^*}_{R^*}\ \underbrace{R_m\cdots R_2 R_1}_{R} $$where $R$ is upper-triangular.
Every Hermitian positive definite matrix $A \in \mathbb{C}^{m \times m}$ has a unique Cholesky factorization.