From ME 510 & AMATH 584

A subspace of a vector space is a subset of a vector space which is itself a vector space.

# QR Decomposition

$\displaylines{A = \begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ 0 \end{bmatrix} = Q_1R_1 }$ where $$A \in \mathbb{R}^{n \times k}$$, $$(n \ge k)$$, $$\begin{bmatrix} Q_1 & Q_2 \end{bmatrix}$$ is orthogonal, $$Q_1 \in \mathbb{R}^{n \times r}$$, $$Q_2 \in \mathbb{R}^{n \times (n-r)}$$, $$R_1 \in \mathbb{R}^{r \times r}$$, $$\mathrm{rank}(A) = r$$

• $$Q_1 R_1$$ is called the reduced QR factorization

• $$Q^\mathrm{T}Q = I_r$$, and $$R_1$$ is upper triangular & invertible
• usually computed using a variation on Gram-Schmidt procedure which is less sensitive to numerical (rounding) errors
• column of $$Q_1$$ are orthonormal basis for $$\mathcal{R}(A)$$, $$\mathcal{R}(Q_1) = \mathcal{R}(A)$$
• column of $$Q_2$$ are orthonormal basis for $$\mathcal{N}(A^\mathrm{T})$$, $$\mathcal{R}(Q_2) = \mathcal{N}(A^\mathrm{T})$$

# Least-Squares

Consider $$y = Ax$$ where $$A \in \mathbb{R}^{m \times n}$$, $$m>n$$. To solve this equation approximately, we first define the residual $$r = Ax-y$$ and trying to find $$x = x_{\mathrm{ls}}$$ that minimizes $$\|r\|$$ $\displaylines{\|r\|^2 = x^\mathrm{T}A^\mathrm{T}Ax - 2y^\mathrm{T}Ax + y^\mathrm{T}y }$ set gradient to zero $\displaylines{\nabla_x \|r\|^2 = 2A^\mathrm{T}Ax - 2A^\mathrm{T}y = 0 }$ we get $\displaylines{x_\mathrm{ls} = (A^\mathrm{T}A)^{-1}A^\mathrm{T}y }$

• $$A^\dagger = (A^\mathrm{T}A)^{-1} A^\mathrm{T}$$ is a left inverse of $$A$$

• $$Ax_\mathrm{ls}$$ is the point in $$\mathcal{R}(A)$$ that is closest to $$y$$, i.e. it is the projection of $$y$$ onto $$\mathcal{R}(A)$$: $$Ax_\mathrm{ls} = \mathcal{P}_{\mathcal{R}(A)}(y) = A(A^\mathrm{T}A)^{-1}A^\mathrm{T}y$$

• $$A(A^\mathrm{T}A)^{-1}A^\mathrm{T}$$ gives projection onto $$\mathcal{R}(A)$$
• optimal residual $$r = Ax_\mathrm{ls} - y$$ is orthogonal to $$\mathcal{R}(A)$$: $$\langle r, Az \rangle = 0$$ for all $$z \in \mathbb{R}^n$$

# Least-Squares via $$QR$$ Factorization

$\displaylines{\|Ax - y\|^2 = \left\|\begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ 0 \end{bmatrix} x - y \right\|^2 }$ since multiplication by orthogonal matrix does not change norm, so \displaylines{\begin{aligned} \|Ax - y\|^2 &= \left\|\begin{bmatrix} Q_1 & Q_2 \end{bmatrix}^\mathrm{T} \begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ 0 \end{bmatrix} x - \begin{bmatrix} Q_1 & Q_2 \end{bmatrix}^\mathrm{T} y \right\|^2 \\ &= \left\|\begin{bmatrix} R_1 x - Q_1^\mathrm{T} y \\ -Q_2^\mathrm{T} y \end{bmatrix} \right\|^2 \\ &=\|R_1 x - Q_1^\mathrm{T} y\|^2 + \|Q_2^\mathrm{T} y\|^2 \end{aligned} } this is minimized by choosing $$x_\mathrm{ls} = R_1^{-1} Q_1^\mathrm{T} y$$

• residual with optimal $$x$$ is $$Ax_\mathrm{ls} - y = -Q_2 Q_2^\mathrm{T} y$$
• $$Q_1 Q_1^\mathrm{T}$$ gives projection onto $$\mathcal{R}(A)$$
• $$Q_2 Q_2^\mathrm{T}$$ gives projection onto $$\mathcal{R}(A)^\perp$$

# Gauss-Newton Method for Nonlinear Least-Squares (NLLS) Problem

Find $$x \in \mathbb{R}^n$$ that minimizes $\displaylines{\|r(x)\|^2 = \sum_{i = 1}^m r_i(x)^2 }$ where $$r:\mathbb{R}^n \to \mathbb{R}^m$$

Linearize $$r$$ near current iterate $$x^{(k)}$$ $\displaylines{r(x) \approx r(x^k) + Jr(x^{(k)})(x-x^{(k)}) }$ where $$Jr$$ is the Jacobian matrix

At $$k$$th iteration, we approximate NLLS problem by linear LS problem: $\displaylines{\|r(x)\|^2 \approx \left\|A^{(k)}x - b^{(k)}\right\|^2 }$ where $$A^{(k)} = Jr(x^{(k)})$$, $$b^{(k)} = Jr(x^{(k)}) x^{(k)} - r(x^{(k)})$$

At $$k+1$$th iteration $\displaylines{x^{(k+1)} = \left( A^{(k)\mathrm{T}} A^{(k)} \right)^{-1} A^{(k)\mathrm{T}} b^{(k)} }$

# Underdetermined Linear Equations

Consider $\displaylines{y = Ax }$ where $$A \in \mathbb{R}^{m \times n}$$, $$(m < n)$$. Assume $$\mathrm{rank}(A) = m$$, the solution has the form $\displaylines{\{ x \mid Ax = y \} = \{ x_p + z \mid z\in \mathcal{N}(A) \} }$ where $$x_p$$ is any solution that satisfies $$A x_p = y$$

• solution has $$\dim \mathcal{N}(A) = n - m$$ degrees of freedom

• the least-norm solution is $$x_\text{ln} = A^\mathrm{T} (A A^\mathrm{T})^{-1} y$$ \displaylines{\begin{aligned} \min \quad &\|x\| \\ s.t. \quad &Ax = y \end{aligned} }

• $$A^\dagger = A^\mathrm{T} (A A^\mathrm{T})^{-1}$$ is a right inverse of $$A$$

• $$I - A^\mathrm{T} (A A^\mathrm{T})^{-1} A$$ gives projection onto $$\mathcal{N}(A)$$

• apply QR factorization of $$A^\mathrm{T} = QR$$ can get the same result $$x_\text{ln} = QR^{-\mathrm{T}} y$$

• derivation via Lagrange multipliers: $$L(x, \lambda) = x^\mathrm{T} x + \lambda^\mathrm{T} (A x - y)$$ $\displaylines{\nabla_x L = 2 x + A^\mathrm{T} \lambda \, , \quad \nabla_\lambda L = A x - y = 0 }$

# General Norm Minimization With Equality Constraints

\displaylines{\begin{aligned} \min \quad & \| A x - b \| \\ s.t. \quad & C x = d \end{aligned} } Lagrangian is \displaylines{\begin{aligned} L(x, \lambda) &= \dfrac{1}{2} \|Ax - b\|^2 + \lambda^\mathrm{T} (Cx -d) \\ &= \dfrac{1}{2} x^\mathrm{T} A^\mathrm{T} Ax - b^\mathrm{T} Ax +\dfrac{1}{2} b^\mathrm{T} b + \lambda^\mathrm{T} Cx - \lambda^\mathrm{T} d \end{aligned} } optimality conditions are $\displaylines{\nabla_x L = A^\mathrm{T} Ax - A^\mathrm{T} b + C^\mathrm{T} \lambda = 0\, , \quad \nabla_\lambda L = Cx - d = 0 }$ write in block matrix form as $\displaylines{\begin{bmatrix} A^\mathrm{T} A & C^\mathrm{T} \\ C & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} A^\mathrm{T} b \\ d \end{bmatrix} }$ if the block matrix is invertible, we can solve for $$x$$ and $$\lambda$$.

If $$A^\mathrm{T} A$$ is invertible, we can derive a more explicit formula for $$x$$ $\displaylines{x = (A^\mathrm{T} A)^{-1} \left(A^\mathrm{T} b - C^\mathrm{T} \bigl(C (A^\mathrm{T} A)^{-1} C^\mathrm{T}\bigr)^{-1} \bigl(C (A^\mathrm{T} A)^{-1} A^\mathrm{T} b\bigr)\right) }$

# Householder Triangularization

The other principal method for computing $$QR$$ factorizations (besides Gram-Schmidt orthogonalization).

Gram-Schmidt: $\displaylines{A R_1 R_2 \cdots R_n = \hat{Q} }$ Householder: $\displaylines{Q_n \cdots Q_2 Q_1 A = R }$ In general, $$Q_k$$ operates on row $$k,\dots,m$$. At the beginning of step $$k$$, there is a block of zeros in the first $$k-1$$ columns of these rows. The application of $$Q_k$$ forms linear combinations of these rows. After $$n$$ steps, all the entries below the diagonal have been eliminated and $$Q_n \cdots Q_2 Q_1 A = R$$ is upper-triangular.

Each $$Q$$ is chosen to be a unitary matrix of the form $\displaylines{Q_k = \begin{bmatrix} I & 0 \\ 0 & F \end{bmatrix} }$ where $$I_k$$ is the $$(k-1) \times (k-1)$$ identity and $$F$$ is an $$(m-k+1) \times (m-k+1)$$ unitary matrix. Multiplication by $$F$$ must introduce zeros into the $$k$$th column. The Householder algorithm chooses $$F$$ to be a particular matrix called a Householder reflector. $\displaylines{Fx = \begin{bmatrix} \|x\| \\ 0 \\ \vdots \\ 0 \end{bmatrix} =\|x\| e_1 }$ $$F$$ will reflect the space across the hyperplane $$H$$ orthogonal to $$v = \|x\| e_1 -x$$. Note that $\displaylines{I - \dfrac{vv^*}{v*v} }$ projects onto the space $$H$$. To reflect $$y$$ across $$H$$, we have $\displaylines{F = I - 2\dfrac{vv^*}{v*v} }$ Due to stability issues, in practice we use $$v = \mathrm{sign}(x_1) \|x\| e_1 + x$$, i.e. always reflect $$x$$ to the vector $$\pm \|x\| e_1$$ that is not too close to $$x$$ itself. $$\mathrm{sign}(x_1) = 0$$ if $$x_1 = 0$$.

# Conditioning and Condition Numbers

A well-conditioned problem is one with the property that all small perturbations of $$x$$ lead to only small changes in $$f(x)$$.

• Absolute Condition Number: $\displaylines{\hat{\kappa} = \lim_{\delta \to 0} \sup_{\|\delta x\| \le \delta} \dfrac{\|\delta f\|}{\|\delta x\|} }$

If $$f$$ is differentiable, then $$\hat{\kappa} = \|J(x)\|$$, where $$J(x)$$ is the Jacobi matrix.

• Relative Condition Number: $\displaylines{\kappa = \lim_{\delta \to 0} \sup_{\|\delta x\| \le \delta} \left( \dfrac{\|\delta f\|}{\|f(x)\|} \bigg/ \dfrac{\|\delta x\|}{\|x\|} \right) }$ If $$f$$ is differentiable, then $\displaylines{\kappa = \dfrac{\|J(x)\|}{\|f(x)\| / \|x\|} }$

Relative condition numbers are more important in numerical analysis because the floating point arithmetic used by computers introduces relative errors rather than absolute ones.

• Condition Number of a matrix \displaylines{\begin{aligned} \kappa &= \sup_{\delta x} \left( \dfrac{\|A(x + \delta x) - Ax\|}{\|Ax\|} \bigg/ \dfrac{\|\delta x\|}{\|x\|} \right) \\ &=\sup_{\delta x} \dfrac{\|A \delta x\|}{\|\delta x\|} \bigg/ \dfrac{\|Ax\|}{\|x\|} \\ &= \|A\| \dfrac{\|x\|}{\|Ax\|} \\ &\le \|A\| \|A^{-1}\| \end{aligned} }

If $$\|\cdot\| = \|\cdot\|_2$$, then $$\|A\| = \sigma_1$$ and $$\|A^{-1}\| = 1/\sigma_m$$. Thus $\displaylines{\kappa(A) = \dfrac{\sigma_1}{\sigma_m} }$ The ratio can be interpreted as the eccentricity of the hyperellipse.

When $$A$$ is a rectangular matrix, we can use the pseudoinverse: $$\kappa(A) = \|A\| \|A^\dagger\|$$

# Cholesky Factorization

Hermitian matrix is a kind of matrix that satisfies $$A = A^*$$. \displaylines{\begin{aligned} A &= \begin{bmatrix} a_{11} & w^* \\ w & K \end{bmatrix} \\ &= \begin{bmatrix} \alpha & 0 \\ w/\alpha & I \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & K-ww^*/a_{11} \end{bmatrix} \begin{bmatrix} \alpha & w^*/\alpha \\ 0 & I \end{bmatrix} \\ &= R_1^* A_1 R_1 \end{aligned} } where $$\alpha = \sqrt{a_{11}}$$. Continue this process until $\displaylines{A = \underbrace{R_1^* R_2^* \cdots R_m^*}_{R^*}\ \underbrace{R_m\cdots R_2 R_1}_{R} }$ where $$R$$ is upper-triangular.

Every Hermitian positive definite matrix $$A \in \mathbb{C}^{m \times m}$$ has a unique Cholesky factorization.