Applied Optimal Control

EN530.603 Review.


Unconstrained Optimization

Optimality Conditions

Necessary Optimality Conditions

\[ \displaylines{\nabla f = 0 \qquad \text{First-order Necessary Conditions} \\ \nabla^2 f \ge 0 \qquad \text{Second-order Necessary Conditions} } \]

Sufficient Optimality Conditions

\[ \displaylines{\nabla f(x^*) = 0, \quad \nabla^2 f(x^*) > 0 } \]

Numerical Solution

\[ \displaylines{x^{k+1} = x^k + \alpha^k d^k, \quad k=0,1,\dots } \]

where \(d^k\) is called the search direction and \(\alpha^k > 0\) is called the stepsize. The most common methods for finding \(\alpha^k\) and \(d^k\) are gradient-based.

  1. Choose direction \(d^k\) so that whenever \(\nabla f(x^k) \ne 0\) we have \[ \displaylines{\nabla f(x^k)^T d^k < 0 } \]

  2. Choose stepsize \(\alpha^k > 0\) so that \[ \displaylines{f(x^k + \alpha d^k) < f(x^k) } \]

Search Direction

Many gradient methods are specified in the form \[ \displaylines{x^{k+1} = x^k - \alpha^k D^k \nabla f(x^k) } \] where \(D^k \in \mathbb{S}^n_{++}\)

Steepest Descent: \[ \displaylines{D^k = I } \] Newton's Method: \[ \displaylines{D^k = [\partial^2 f(x^k)]^{-1} } \] Gauss-Newton Method: When the cost has a special least squares form \[ \displaylines{f(x) = \frac{1}{2} \|g(x)\|^2 } \] we can choose \[ \displaylines{D^k = \left[ \nabla g(x^k) \nabla g(x^k)^T \right]^{-1} } \] Conjugate-Gradient Method: Choose linearly independent (conjugate) search directions \(d^k\) \[ \displaylines{d^k = -\nabla f(x^k) + \beta d^{k-1} } \] The most common way to compute \(\beta^k\) is \[ \displaylines{\beta^{k}=\frac{\nabla f\left(x^{k}\right)^{T}\left(\nabla f\left(x^{k}\right)-\nabla f\left(x^{k-1}\right)\right)}{\nabla f\left(x^{k-1}\right)^{T} \nabla f\left(x^{k-1}\right)} } \]

Stepsize

Minimization Rule: Choose \(\alpha^k \in [0,s]\) so that \(f\) is minimized \[ \displaylines{f(x^k + \alpha^k d^k) = \min_{\alpha \in [0, s]} f(x^k + \alpha d^k) } \] Successive Stepsize Reduction - Armijo Rule:

  1. Choose \(s>0, 0<\beta<1, 0<\sigma<1\)
  2. Increase: \(m = 0, 1, \dots\)
  3. Until: \(f(x^k) - f(x^k+\beta^m s d^k) \ge -\sigma \beta^m s \nabla f(x^k)^T d^k\)

where \(\beta\) is the rate of decrease and \(\sigma\) is the acceptance ratio

Regularized Newton Method

Drawbacks of pure Newton's method:

  • The inverse Hessian \([\nabla^2 f(x)]^{-1}\) might not be computable
  • When \(\nabla^2 f(x) \nsucc 0\) the method can be attracted by global maxima

Add a regularizing term to the Hessian and solve the system \[ \displaylines{(\nabla^2 f(x^k) + \Delta^k) d^k = -\nabla f(x^k) } \] where \(\Delta^k\) is chosen so that \(\nabla^2 f(x^k) + \Delta^k \succ 0\)

In trust-region methods one sets \[ \displaylines{\Delta^k = \delta^k I } \] and \((\nabla^2 f(x^k) + \delta^k I) d^k = -\nabla f(x^k)\) is equivalent to \[ \displaylines{\DeclareMathOperator*{\argmin}{\arg\min} \DeclareMathOperator*{\argmax}{\arg\max} d^k \in \argmin_{\|d\| \le \gamma^k} f^k(d) } \]

Constrained Optimization

Equality Constraints

Minimize \(L(x): \mathbb{R}^n \rightarrow \mathbb{R}\), subject to \(k\) constraints \(f(x) = 0\).

Necessary Optimality Conditions

Let \(H(x, \lambda) = L(x) + \lambda^T f(x)\) be the Hamiltonian, where \(\lambda\) are scalars and called the Lagrangian multipliers \[ \displaylines{\nabla_x H(x^*, \lambda^*) = 0 \qquad \text{First-order Necessary Condtions} \\ dx^T [\nabla_{xx} H(x^*, \lambda^*)] dx \ge 0 \qquad \text{Secon-order Necessary Condtions} } \] Note that \(dx\) is not arbitrary, i.e. we require \(\nabla f_i(x^*)^T dx = 0\)

Geometric meaning: any feasible \(dx\) must be spanned by the gradients \(\nabla f_i\). At an optimum \(x^*\) we must also have \(\nabla L (x^*)^T dx = 0\), so that \(\nabla L(x)\) must be spanned by gradients as well: \[ \displaylines{\nabla L(x^*) = \sum_{i=1}^n \lambda_i \nabla f_i(x^*) } \]

Sufficient Optimality Conditions

\[ \displaylines{\nabla_x H = 0, \quad dx^T [\nabla_{xx} H] dx > 0 } \]

Inequality Constraints

Minimize \(L(x)\) subject to \(f(x) \le 0\).

Now the Lagrangian multipliers have to satisfy \[ \displaylines{\lambda = \begin{cases} \ge 0, \quad f(x)=0 \\ =0, \quad f(x) < 0 \end{cases} } \]

Geometric meaning: the gradient of \(L\) w.r.t. \(x\) at a minimum must be pointed in a way that decrease of \(L\) can only come by violating the constraints: \[ \displaylines{-\nabla L = \sum_{i=1}^n \lambda_i \nabla f_i \qquad (\lambda_i \ge 0) } \]

Trajectory Optimization

Solving the optimal control problems \[ \displaylines{\begin{array}{ll} \text{minimize} &J(x(t), u(t), t_f) = \displaystyle\int_{t_0}^{t_f} L(x(t), u(t), t) dt \\ \text{subject to} &\dot{x}= f(x,u,t) \\ &\text{other constratins} \end{array} } \] The cost \(J\) is called a functional, i.e. a function of functions. \[ \displaylines{\text{differential of a function}\quad \nabla g = 0 \quad \Longleftrightarrow \quad \text{variation of a functional}\quad \delta J = 0 } \]

Euler-Lagrange Equation

Consider \[ \displaylines{J = \int_{t_0}^{t_f} g(x(t), \dot{x}(t)) dt } \] given \(x(t_0)\)

The variation is \[ \displaylines{\delta J=\int_{t_{0}}^{t_{f}}\left[g_{x}(x, \dot{x}) \delta x-\frac{d}{d t} g_{\dot{x}}(x, \dot{x}) \delta x\right] dt+g_{\dot{x}}(x(t_{f}), \dot{x}(t_{f})) \delta x(t_{f}) } \] Fixed boundary conditions: if \(x(t_f)\) is given, \(\delta J=0\) is equivalent to \[ \displaylines{g_x(x, \dot{x}) - \frac{d}{dt}g_{\dot{x}}(x, \dot{x}) = 0 } \] Free boundary conditions: if \(x(t_f)\) is not fixed, \(\delta J=0\) is equivalent to \[ \displaylines{g_x(x, \dot{x}) - \frac{d}{dt}g_{\dot{x}}(x, \dot{x}) = 0 \\ g_{\dot{x}}(x(t_f), \dot{x}(t_f)) = 0 } \]

Free Final-Time

When \(t_f\) is allowed to vary \[ \displaylines{\begin{aligned} \delta J=& \int_{t_{0}}^{t_{f}}\left[g_{x}(x^{*}, \dot{x}^{*}, t) \delta x-\frac{d}{d t} g_{\dot{x}}(x^{*}, \dot{x}^{*}, t) \delta x\right] d t \\ &+g_{\dot{x}}(x^{*}(t_{f}), \dot{x}^{*}(t_{f}), t) \delta x(t_{f})+g(x^{*}(t_{f}), \dot{x}^{*}(t_{f}), t_{f}) \delta t_{f} \\ =& \int_{t_{0}}^{t_{f}}\left[g_{x}(x^{*}, \dot{x}^{*}, t) \delta x-\frac{d}{d t} g_{\dot{x}}(x^{*}, \dot{x}^{*}, t) \delta x\right] d t \\ &+\left(g_{\dot{x}}(x^{*}, \dot{x}^{*}, t) \delta x_{f}+ [g(x^{*}, \dot{x}^{*}, t_{f}) - g_{\dot{x}}(x^{*}, \dot{x}^{*}, t_{f})\dot{x}] \delta t_{f}\right)_{t=t_f} \\ \end{aligned} } \]

where \(\delta x_f\) is the total space-time variation defined as \[ \displaylines{\delta x_f = \delta x(t_f) + \dot{x}(t_f)\delta t_f } \] Unrelated \(t_f\) and \(x(t_f)\): the additional necessary conditions are \[ \displaylines{g_{\dot{x}}(t_f) = 0 \\ g(t_f) = 0 } \] Function related: when \(x(t_f) = \Theta(t_f)\), then \(\delta x_f = \frac{d\Theta}{dt} \delta t_f\), the additional necessary conditions becomes \[ \displaylines{g_{\dot{x}}(t_f) \left[\frac{d\Theta}{dt} - \dot{x}^*\right]_{t=t_f} + g(t_f) = 0 } \]

Differential Constraints

\[ \displaylines{\begin{array}{cl} \text{minimize} & \displaystyle J = \int_{t_0}^{t_f} g(x, \dot{x}, t) dt \\ \text{subject to} & f(x(t), \dot{x}(t), t) = 0 \end{array} } \]

where \(t_f\) and \(x(t_f)\) are fixed. Define the Lagrangian multipliers \(\lambda : [t_0, t_f] \rightarrow \mathbb{R}^n\) and the augmented cost \[ \displaylines{J_a = \int_{t_0}^{t_f} g_a dt \\ g_a = g + \lambda^T f } \] The necessary optimality conditions are the Euler-Lagrange equations w.r.t. \(g_a\)

General Boundary Constraints

\[ \displaylines{\begin{array}{cl} \text{minimize} & \displaystyle J = \varphi(x(t_f), t_f) + \int_{t_0}^{t_f} g(x, \dot{x}, t) dt \\ \text{subject to} & \psi(x(t_f), t_f) = 0 \end{array} } \]

where \(t_f\) is free. Defined the augmented cost \[ \displaylines{J_a = \omega(x(t_f), \nu, t_f) + \int_{t_0}^{t_f} g(x, \dot{x}, t) dt \\ \omega(x(t_f), \nu, t_f) = \varphi(x(t_f), t_f) + \nu^T \psi(x(t_f), t_f) } \] The necessary optimality conditions are \[ \displaylines{\begin{aligned} &\nabla_x \omega(x(t_f), \nu, t_f) + \nabla_{\dot{x}} g(x(t_f), \dot{x}(t_f), t_f) = 0 \qquad (\delta x_f) \\ &\frac{\partial}{\partial t_f} \omega(x(t_f), \nu, t_f) + g(x(t_f), \dot{x}(t_f), t_f) - \nabla_{\dot{x}} g(x(t_f), \dot{x}(t_f), t_f)^T \dot{x}(t_f) = 0 \qquad (\delta t_f) \\ &\psi(x(t_f), t_f) = 0 \qquad (\text{constraints}) \\ &\nabla_x g(x,\dot{x}, t) - \frac{d}{dt}\nabla_{\dot{x}}g(x,\dot{x},t) = 0, \quad t\in (t_0, t_f) \qquad (\delta x(t)) \end{aligned} } \]

Continuous Optimal Control

\[ \displaylines{\begin{array}{cl} \text{minimize} & \displaystyle J = \varphi(x(t_f), t_f) + \int_{t_0}^{t_f} L(x, u, t) dt \\ \text{subject to} & \psi(x(t_f), t_f) = 0 \\ & \dot{x} = f(x, u, t) \end{array} } \]

Define the augmented cost \[ \displaylines{J_a = \varphi(t_f) + \nu^T \psi(t_f) + \int_{t_0}^{t_f} [H(x,u,\lambda,t) -\lambda^T \dot{x}] dt \\ H = L(x,u,t) + \lambda^T(t)f(x,u,t) } \] The necessary optimality conditions are \[ \displaylines{\text{Euler-Lagrange:} \qquad \begin{aligned} &\dot{x} = f(x,u,t) \qquad (\text{dynamics}) \\ &\dot{\lambda} = -\nabla_x H \qquad (\delta x(t)) \\ \end{aligned} } \]

\[ \displaylines{\text{Control Opimization:}\qquad \nabla_u H = 0 } \]

\[ \displaylines{\text{Transversality Conditions (TC):}\qquad \begin{aligned} &\psi(x(t_f), t_f) = 0 \qquad(\text{boundary constraint})\\ &\lambda(t_f) = \nabla_x \varphi(t_f) + \nabla_x \psi(t_f)\cdot \nu \qquad (\delta x_f) \\ &(\partial_t \varphi + \nu^T \partial_t \psi + L + \lambda^T f)_{t=t_f} = 0 \qquad (\delta t_f) \end{aligned} } \]

Hamiltonian Conservation

When the Hamiltonian does not depend on time (\(f\) and \(L\) do not depend on time): \[ \displaylines{\partial_t H(x,u,\lambda, t) = 0 } \] then \(H\) is a conserved quantity along optimal trajectories \[ \displaylines{\begin{aligned} \dot{H}(x,u,\lambda,t) &= \partial_x H \cdot \dot{x} + \partial_u H \cdot \dot{u} + \partial_\lambda H \cdot \dot{\lambda} + \partial_t H \\ &= -\dot{\lambda}^T f(x,u,t) + 0 + f(x,u,t)^T \dot{\lambda} + 0 \\ &= 0 \end{aligned} } \]

Linear-Quadratic Regulator

Dynamics is linear and cost function is quadratic \[ \displaylines{\dot{x} = Ax + Bu \\ J = \frac{1}{2}x^T(t_f) P_f x(t_f) + \int_{t_0}^{t_f} \frac{1}{2} [x(t)^T Q(t) x(t) + u(t)^T R(t) u (t)] dt } \] where \(P_f, Q \in \mathbb{S}_{+}\) and \(R\in\mathbb{S}_{++}\)

Using the optimality conditions to get \[ \displaylines{\begin{pmatrix} \dot{x} \\ \dot{\lambda} \end{pmatrix} = \begin{pmatrix} A & -BR^{-1}B^T \\ -Q & -A^T \end{pmatrix} \begin{pmatrix} x \\ \lambda \end{pmatrix} \\ u = -R^{-1}B^T\lambda \\ \lambda(t_f) = P_f x(t_f) \\ } \] Kalman showed that \(\lambda(t)\) are linear functions of the states \(\lambda(t) = P(t) x(t)\), then we have the Riccati ODE \[ \displaylines{\dot{P} = -A^T P - PA + PBR^{-1}B^TP - Q, \qquad P(t_f) = P_f } \] and the control is \[ \displaylines{u(t) = -R^{-1}B^TP(t)x(t) = -K(t) x(t) } \]

Optimal Cost

\[ \displaylines{\begin{aligned} J(t) &=\frac{1}{2} x^{T}\left(t_{f}\right) P_{f} x\left(t_{f}\right)+\int_{t}^{t_{f}} \frac{1}{2}\left[x^{T} Q x+u^{T} R u\right] \mathrm{d} t \\ &=\frac{1}{2} x^{T}\left(t_{f}\right) P_{f} x\left(t_{f}\right)+\int_{t}^{t_{f}} \frac{1}{2}\left[x(t)^{T}\left(Q+K^{T} R K\right) x\right] \mathrm{d} t \\ &=\frac{1}{2} x^{T}\left(t_{f}\right) P_{f} x\left(t_{f}\right)-\int_{t}^{t_{f}} \frac{d}{d t}\left(\frac{1}{2} x^{T} P x\right) \mathrm{d} t \\ &=\frac{1}{2} x^{T}(t) P(t) x(t) \end{aligned} } \]

Trajectory Tracking

Consider the problem of not stabilizing to the origin, i.e. \(x\rightarrow 0\) but tracking a given reference trajectory \(x_d(t)\), i.e. \(x\rightarrow x_d\).

Define the error state and control error \[ \displaylines{e = x - x_d \\ v = u - u_d } \] then \[ \displaylines{\begin{aligned} \dot{e} &= \dot{x} - \dot{x}_d \\ &= f(x_d+e, u_d+v) - f(x_d, u_d) \\ &\approx \left.\frac{\partial f}{\partial x}\right|_{x=x_d} e + \left.\frac{\partial f}{\partial u}\right|_{u=u_d} u \\ &= Ae + Bv \end{aligned} } \] Use LQR to solve \(v = -K e\) and the control is \[ \displaylines{u = -K(x - x_d) + u_d } \]

Constrained Optimal Control

Consider optimal control problems subject to constraints \(|u(t)| \le 1\)

Pontryagin's Minimum Principle

The optimal control must minimize the Hamiltonian: \[ \displaylines{H(x^*(t), u^*(t) + \delta u(t), \lambda^*(t), t) \ge H(x^*(t), u^*(t), \lambda^*(t), t) } \]

Minimum-Time Problems

Consider a linear system \[ \displaylines{\dot{x} = Ax + Bu, \qquad x(0) = x_0 } \] with a single control \(u\) constrained by \(|u(t)| \le 1\) and is required to reach to origin \(x(t_f) = 0\) in minimum time. The cost function is then \[ \displaylines{J = \int_{t_0}^{t_f} (1) dt } \] So the Hamiltonian is \(H = 1+ \lambda^T(Ax + Bu)\). According to the minimum principle, we have \[ \displaylines{u^* = +1, \quad \text{if } \lambda^TB < 0 \\ u^* = -1, \quad \text{if } \lambda^TB > 0 } \] where \(\lambda^TB\) is called the switching function. This is an example of bang-bang control since the control is always at its max or min.

In addition, to ensure smoothness in the trajectory during transitions we need \[ \displaylines{\lambda(t^-) = \lambda(t^+) \\ H(t^-) = H(t^+) } \] where \(t\) is the time of transition. These conditions are called Weierstrass-Erdmann conditions

Minimum Control Effort Problems

Consider a nonlinear system with affine controls defined by \[ \displaylines{\dot{x} = a(x(t),t) + B(x(t), t) u(t) } \] where \(B\) is a \(n\times m\) matrix and \(|u_i(t)| \le 1\) for \(i=1,\dots,m\). The cost function is \[ \displaylines{J = \int_{t_0}^{t_f} \left(\sum_{i=1}^m |u_i(t)|\right) dt } \] Express \(B\) as \([b_1 \,|\, b_2 \,|\, \cdots \,|\, b_n]\) and assume the components of \(u\) are independent of one another, from the minimum principle we have \[ \displaylines{u_i^* = \begin{cases} 1, & \text{for } \lambda^{*T}b_i < -1 \\ 0, & \text{for } -1 < \lambda^{*T}b_i < 1 \\ -1, & \text{for } 1 < \lambda^{*T}b_i \\ \ge 0, & \text{for } \lambda^{*T}b_i = -1 \\ \le 0, & \text{for } \lambda^{*T}b_i = 1 \end{cases} } \]

Singular Controls

Singular controls are controls that cannot be directly determined by either the optimality conditions or the minimum principle.

In the unconstrained case this generally occurs when \(\nabla_u H = 0\) cannot be solved for \(u\) which is caused by the conditions \(\nabla_u^2 H = 0\), i.e. when the Hamiltonian is not convex.

Consider the LQR setting where \(R = 0\), then the first order condition \[ \displaylines{\nabla_u H = B^T \lambda = 0 } \] which does not provide information about \(u\). The solution is to consider higher-order derivative of \(H_u\) until \(u\) apperas explicitly: \[ \displaylines{\frac{d}{dt} \nabla_u H = B^T \dot{\lambda} = -B^T(Qx + A^T \lambda) = 0 \\ \frac{d^2}{dt^2} \nabla_u H = B^T \dot{\lambda} = -B^T(Q(Ax + Bu) + A^T (Qx + A^T\lambda)) = 0 } \] which now provides enough information to obtain the singular control \(u\)

General Constraints

Now consider general constraints \(c(x,u,t) \le 0\). The Hamiltonian is defined by \[ \displaylines{H = L + \lambda^T f + \mu^T c, \qquad \text{where } \begin{cases} \mu \ge 0, \quad \text{if } c = 0 \\ \mu = 0, \quad \text{if } c < 0 \end{cases} } \] The adjoint equations are \[ \displaylines{\dot{\lambda} = \begin{cases} -\nabla_x L - \nabla_x f^T \lambda, & c<0 \\ -\nabla_x L - \nabla_x f^T \lambda - \nabla_x c^T \mu, & c=0 \end{cases} } \] The control is found by setting \(\nabla_u H = 0\): \[ \displaylines{0 = \begin{cases} \nabla_u L + \nabla_u f^T \lambda, & c<0 \\ \nabla_u L + \nabla_u f^T \lambda + \nabla_u c^T \mu, & c=0 \end{cases} } \]

Inequality Constraints on the State Only

Constraints with the form \(c(x(t), t) \le 0\) is more difficult to handle and require differentiation until the control appears explicitly.

In general, if the constraint is differentiated \(q\) times until \(u\) shows up, then the Hamiltonian is defined as \[ \displaylines{H = L + \lambda^T f + \mu^T c^{(q)} } \] In addition, the tangency constraints are enforced at time \(t\) when \(c(x,t)\) becomes active. \[ \displaylines{\begin{bmatrix} c(x,t) \\ \dot{c}(c,t) \\ \vdots \\ c^{(q-1)}(x,t) \end{bmatrix} = 0 } \]

Dynamic Programming

Discrete-Time DP

Consider a discrete optimization problem \[ \displaylines{\begin{array}{ll} \text{minimize} &J = \phi(x_N, t_N) + \sum_{i=0}^{N-1} L_i(x_i, u_i) \\ \text{subject to} & x_{i+1} = f_i(x_i, u_i) \end{array} } \] with given \(x(t_0)\), \(t_0\), \(t_N\).

Define the cost from discrete stage \(i\) to \(N\) by \[ \displaylines{J_i \triangleq \phi(x_N, t_N) + \sum_{k=i}^{N-1} L_k(x_k, u_k) } \] and the optimal cost-to-go function (or optimal value function) at stage \(i\) as \[ \displaylines{V_i(x) \triangleq \min_{u_{i:N-1}} J_i } \] Bellman equation: \[ \displaylines{V_i(x) = \min_u [L_i(x, u) + V_{i+1}(x')] } \] with \(V_N(x) = \phi(x, t_N)\)

Discrete-Time LQR

\[ \displaylines{x_{i+1} = A_i x_i + B_i u_i \\ \phi(x) = \frac{1}{2} x^T P_f x, \qquad L_i(x,u) = x^TQ_ix + u^TR_iu } \]

Assume the value function is of the form \[ \displaylines{V_i(x) = \frac{1}{2}x^TP_ix } \] for \(P_i \in \mathbb{S}_{++}\) with boundary condition \(P_N = P_f\). Using Bellman's principle we can get \[ \displaylines{u^{*}=-\left(R_{i}+B_{i}^{T} P_{i+1} B_{i}\right)^{-1} B_{i}^{T} P_{i+1} A_{i} x \equiv -K_{i} x } \] Substituting \(u^*\) back into the Bellman equation to obtain \[ \displaylines{x^{T} P_{i} x=x^{T}\left[K_{i}^{T} R_{i} K_{i}+Q_{i}+\left(A_{i}-B_{i} K_{i}\right)^{T} P_{i+1}\left(A_{i}-B_{i} K_{i}\right)\right] x } \] This relationship can be cycled backward starting from \(P_N\) to \(P_0\).

It can also be expressed without gains \(K_i\) according to \[ \displaylines{P_{i}=Q_{i}+A_{i}^{T}\left[P_{i+1}-P_{i+1} B_{i}\left(R_{i}+B_{i}^{T} P_{i+1} B_{i}\right)^{-1} B_{i}^{T} P_{i+1}\right] A_{i} } \]

Continuous DP

Define the continuous value function \(V(x,t)\) as \[ \displaylines{V(x(t), t)=\min _{u(t), t \in\left[t, t_{f}\right]}\left[\phi\left(x\left(t_{f}\right), t_{f}\right)+\int_{t}^{t_{f}} L(x(\tau), u(\tau), \tau) d \tau\right] } \] Bellman equation can be expressed as \[ \displaylines{V(x(t), t)=\min_{u(t), t \in[t, t+\Delta]}\left[\int_{t}^{t+\Delta} L(x(\tau), u(\tau), \tau) \mathrm{d} \tau+V(x(t+\Delta), t+\Delta)\right] } \] Expand \(V(x(t+\Delta), t+\Delta)\) according to \[ \displaylines{V(x(t+\Delta), t+\Delta) = V(x(t), t) + [\partial_t V(x(t), t) + \nabla_x V(x(t), t)^T \dot{x}] \Delta + o(\Delta) } \] Substituting back into Bellman equation we can get the Hamilton-Jacobi-Bellman equation (HJB): \[ \displaylines{-\partial_{t} V(x, t)=\min_{u(t)}\left[L(x, u, t)+\nabla_{x} V(x, t)^{T} f(x, u, t)\right] } \]

Numerical Methods for Optimal Control

General optimal control problem: \[ \displaylines{\begin{aligned} \text{minimize}\quad &J=\phi\left(x(t_{f}), t_{f}\right)+\int_{t_{0}}^{t_{f}} L(x(t), u(t), t) d t \\ \text{subject to}\quad &x\left(t_{0}\right)=x_{0}, \quad x\left(t_{f}\right) \text{ and } t_{f} \text{ free} \\ &\dot{x}(t)=f(x(t), u(t), t) \\ &c(x(t), u(t), t) \leq 0, \text { for all } t \in\left[t_{0}, t_{f}\right] \\ &\psi\left(x\left(t_{f}\right), t_{f}\right) \leq 0 \end{aligned} } \]

Consider three families of numerical methods for solving this:

  • dynamic programming: i.e. solution to the HJB equations
  • indirect methods: base on calculus of variations, and Pontryagin's principle
  • direct methods: based on a finite-dimensional representation

Indirect Methods

Start with the necessary conditions (ignore the path constraints): \[ \displaylines{\begin{aligned} &\dot{x}=f(x, u, t) \\ &\dot{\lambda}=-\nabla_{x} H(x, u, \lambda, t) \\ &u^{*}=\min_{u} H(x, u, \lambda, t) \\ &\lambda\left(t_{f}\right)=\nabla_{x} \phi\left(x\left(t_{f}\right), t_{f}\right)+\nabla_{x} \psi\left(x\left(t_{f}\right), t_{f}\right)^{T} \nu \\ &\left(\partial_{t} \phi+\nu^{T} \partial_{t} \psi+H\right)_{t=t_{f}}=0 \end{aligned} } \]

Indirect Shooting

Shooting methods are based on integrating the EL equations forward from time \(t_0\) to time \(t_f\) using a starting guess and then satisfying the boundary and transversality conditions at time \(t_f\) by formulating a root-finding problem solved by e.g. Newton-type method.

The optimization variables are \[ \displaylines{\lambda(t_0), \nu, t_f } \] and the equations to be solved are \[ \displaylines{\left[\begin{array}{c} \psi\left(x\left(t_{f}\right), t_{f}\right) \\ \lambda\left(t_{f}\right)-\left[\nabla_{x} \phi\left(x\left(t_{f}\right), t_{f}\right)+\nabla_{x} \psi\left(x\left(t_{f}\right), t_{f}\right)^{T} \nu\right] \\ \left(\partial_{t} \phi+\nu^{T} \partial_{t} \psi+H\right)_{t=t_{f}} \end{array}\right]=0 } \]

Indirect Multiple-Shooting

Split the time-interval into segments and use indirect shooting.

First choose discrete time \([t_0, t_1, \dots, t_N]\) and let \(v_i=(x(t_i), \lambda(t_i))\). Let \(\bar{v}_i\) be the result of integrating the EL equations on \([t_i, t_{i+1}]\). The optimization variables are \[ \displaylines{\lambda(t_0), v_1, v_2, \dots, v_N, \nu, t_f } \] and the equations are \[ \displaylines{\left[\begin{array}{c} \bar{v}_{0}-v_{1} \\ \vdots \\ \bar{v}_{N-1}-v_{N} \\ \psi\left(x\left(t_{f}\right), t_{f}\right) \\ \lambda\left(t_{f}\right)-\left[\begin{array}{c} \nabla_{x} \phi\left(x\left(t_{f}\right), t_{f}\right)+\nabla_{x} \psi\left(x\left(t_{f}\right), t_{f}\right)^{T} \nu \end{array}\right] \\ \left(\partial_{t} \phi+\nu^{T} \partial_{t} \psi+H\right)_{t=t_{f}} \end{array}\right]=0 } \]

Direct Methods

Start with either discretizing time and solving a discrete-time optimal control problems, or by parametrizing the controls using a finite set of parameters. This then becomes a nonlinear programming (NLP) problem.

Direct Shooting

Based on parametrizing the control \(u(t)\) by a finite number of parameters \(p_k\): \[ \displaylines{u(t) = \sum_{k=1}^M p_k B_k(t) } \] where \(B_k(t)\) are a set of basis functions.

The NLP variables are \([p, t_f]\) and the state can be obtained by forward integration of the dynamics.

Direct Multiple-Shooting

Choose discrete times \([t_0, t_1, \dots, t_N]\), discrete trajectory \([x_0, x_1, \dots, x_N]\) and a discrete set of parametrized control \([u(p_0), u(p_1), \dots, u(p_N)]\). Then perform direct shooting step on each interval \([t_i, t_{i+1}]\) by integrating the dynamics from \(x_i\) and obtaining \(\bar{x}_i\)

The NLP variables are \(p_0, \dots, p_{N-1}, x_1,\dots,x_N, t_f\) and the equations to be solved are \[ \displaylines{\left[\begin{array}{c} \bar{x}_{0}-x_{1} \\ \vdots \\ \bar{x}_{N-1}-x_{N} \\ \psi\left(x_{N}, t_{N}\right) \end{array}\right]=0 } \]

Optimal State Estimation

Linear model, Gaussian noise:

  • Kalman filter (KF): linear discrete-time dynamics, linear measurement model, Gaussian noise
  • Kalman-Bucy filter: same as KF but continuous-time dynamics

Nonlinear model, Gaussian noise:

  • Extended Kalman filter (EKF): linearize dynamics and sensor model and apply KF
  • Unscented Kalman filter (UKF): linearization is avoided during uncertainty propagation by propagating the principle axes of the uncertainty ellipsoid through the nonlinear dynamics and then reconstructing the updated ellipsoid; measurements are still processed through linearization

Nonlinear model, arbitrary noise:

  • Particle filter (PF): states are approximated using weighted samples whose weights are updated by obtained measurements

Least-Squares for Static Estimation

Consider the sensor model with \(x\in\mathbb{R}^n\) and \(z\in\mathbb{R}^k\) \[ \displaylines{z = Hx + v } \] where \(v\) is a random variable denoting the measurement error. The goal is find the optimal estimate \(\hat{x}\) which can be accomplished by first defining the measurement estimate error \(e_z = z - H \hat{x}\) and minimizing \[ \displaylines{J = \frac{1}{2}e_z^T e_z } \]

  • Necessary condition: \(\hat{x} = (H^TH)^{-1}H^Tz\)
  • The sufficient condition: \(\nabla^2J = H^TH \succ 0\). This is satisfied when \(\operatorname{rank}(H) = n\)

Now assume there are some prior information about \(x\) and \(v\) \[ \displaylines{\mathbb{E}[x] = \hat{x}_0, \qquad \mathbb{E}[(x - \hat{x}_0)(x - \hat{x}_0)^T] = P_0 \\ \mathbb{E}[v] = 0, \qquad \mathbb{E}[vv^T] = R } \] where \(R\) is a diagonal matrix. Then a meaningful cost function can be defined as \[ \displaylines{J(\hat{x})=\frac{1}{2}\left(\hat{x}-\hat{x}_{0}\right)^{T} P_{0}^{-1}\left(\hat{x}-\hat{x}_{0}\right)+\frac{1}{2}(z-H \hat{x})^{T} R^{-1}(z-H \hat{x}) } \]

  • Necessary condition: \(\hat{x}=\left(H^{T} R^{-1} H+P_{0}^{-1}\right)^{-1}\left(H^{T} R^{-1} z+P_{0}^{-1} \hat{x}_{0}\right)\)
  • The sufficient condition: \(\nabla^2J = P_0^{-1} + H^TR^{-1}H \succ 0\)

The necessary condition can be expressed as \[ \displaylines{\begin{aligned} \hat{x} &= \hat{x}_0 + (P_0^{-1} + H^TR^{-1}H)^{-1} H^T R^{-1} (z - H \hat{x}_0) \\ &\triangleq \hat{x}_0 + PH^TR^{-1} (z - H \hat{x}_0) \\ &\triangleq \hat{x}_0 + K^{-1} (z - H \hat{x}_0) \\ \end{aligned} } \] The matrix \(P\) is actually the covariance matrix of the error in the estimate \(\hat{x}\), i.e. \(P=\mathbb{E}[(\hat{x} - x)(\hat{x}- x)^T]\).

The recursive least-squares algorithm:

  1. Given prior: mean \(\hat{x}_0\) and covariance \(P_0\)
  2. For each new measurement \(z_i = H_ix+v_i\), where \(v_i \sim \mathcal{N}(0, R_i)\)
    1. \(\hat{x}_{i} =\hat{x}_{i-1}+K_{i}\left(z_{i}-H_{i} \hat{x}_{i-1}\right)\)
    2. \(P_{i} =P_{i-1}-P_{i-1} H_{i}^{T}\left(H_{i} P_{i-1} H_{i}^{T}+R_{i}\right)^{-1} H_{i} P_{i-1}\)
    3. \(K_{i} =P_{i} H_{i}^{T} R_{i}^{-1}\)

Here \(P_i\) is updated using the matrix inversion lemma: \[ \displaylines{\begin{aligned} P_i &= \left(P_{i-1}^{-1}+H_{i}^{T} R_{i}^{-1} H_{i}\right)^{-1} \\ &= P_{i-1}-P_{i-1} H_{i}^{T}\left(H_{i} P_{i-1} H_{i}^{T}+R_{i}\right)^{-1} H_{i} P_{i-1} \end{aligned} } \]

Propagation of Uncertainty

Consider the restricted class of systems \[ \displaylines{\dot{x}(t) = f(x(t), u(t)) + L(t)w(t) } \] where \(L(t)\) is a given matrix and where the noise \(w(t)\) evolves in continuous time. Assume \(w(t)\) is uncorrelated in time, i.e. \[ \displaylines{\mathbb{E}[w(t)w(\tau)^T] = Q_c'(t) \delta(t - \tau) } \]

Linear System

Consider linear systems with Gaussian noise \[ \displaylines{x_k = \Phi_{k-1}x_{k-1} + \Gamma_{k-1}u_{k-1} + \Lambda_{k-1}w_{k-1} } \] where \(\mathbb{E}[w_k]=0\), \(\mathbb{E}[w_k w_l^T] = Q_k'\delta_{kl}\) and \(x_0 \sim \mathcal{N}(\hat{x}_0, P_0)\)

The goal is to propagate the mean and covariance: \[ \displaylines{\hat{x}_k = \mathbb{E}[x] = \Phi_{k-1}\hat{x}_{k-1} + \Gamma_{k-1}u_{k-1} } \]

\[ \displaylines{\begin{aligned} P_k &= \mathbb{E}[(x_k - \hat{x}_k)(x_k-\hat{x}_k)^T] \\ &= \Phi_{k-1}P_{k-1}\Phi_{k-1}^T + \Lambda_{k-1}Q_{k-1}'\Gamma_{k-1}^T \end{aligned} } \]

The resulting update \(\left(\hat{x}_{k-1}, P_{k-1}\right) \rightarrow\left(\hat{x}_{k}, P_{k}\right)\) are called a Gauss-Markov sequence, given by \[ \displaylines{\begin{aligned} \hat{x}_{k} &=\Phi_{k-1} \hat{x}_{k-1}+\Gamma_{k-1} u_{k-1} \\ P_{k} &=\Phi_{k-1} P_{k-1} \Phi_{k-1}^{T}+Q_{k-1} \\ Q_{k} &= \Lambda_k Q_k' \Lambda_k' \end{aligned} } \]

Continuous to Discrete

Consider \[ \displaylines{\dot{x}(t)=F(t) x(t)+G(t) u(t)+L(t) w(t) } \] We have \[ \displaylines{x(t_k) =\Phi(t_{k}, t_{k-1}) x(t_{k-1})+\int_{t_{k-1}}^{t_{k}} \Phi(t_{k}, \tau) [G(\tau) u(\tau)+L(\tau) w(\tau)] d\tau } \] where \(\Phi(t_k, t_{k-1})\) is the state transition matrix such that \[ \displaylines{\Phi(t_{k}, t_{k-1}) x(t_{k-1})=x(t_{k-1})+\int_{t_{k-1}}^{t_{k}}[F(\tau) x(\tau)] d \tau } \] The mean evolves according to (assume the control is constant during the sampling interval) \[ \displaylines{\begin{aligned} \hat{x}_{k} &= \Phi_{k-1} \hat{x}_{k-1}+\int_{t_{k-1}}^{t_{k}} \Phi(t_{k}, \tau) G(\tau) u(\tau) d \tau \\ &= \Phi_{k-1} \hat{x}_{k-1}+\left[\int_{t_{k-1}}^{t_{k}} \Phi(t_{k}, \tau) G(\tau) d \tau\right] u_{k-1} \\ &= \Phi_{k-1} \hat{x}_{k-1} + \Gamma_{k-1}u_{k-1} \end{aligned} } \] The covariance evolves according to \[ \displaylines{\begin{aligned} P_{k} &=\Phi_{k-1} P_{k-1} \Phi_{k-1}^{T}+\mathbb{E}\left\{\left[\int_{t_{k-1}}^{t_{k}} \Phi(t_{k}, \tau) L(\tau) w(\tau) d \tau\right]\left[\int_{t_{k-1}}^{t_{k}} \Phi(t_{k}, \alpha) L(\alpha) w(\alpha) d \alpha\right]^{T}\right\} \\ &=\Phi_{k-1} P_{k-1} \Phi_{k-1}^{T}+\int_{t_{k-1}}^{t_{k}} \int_{t_{k-1}}^{t_{k}} \Phi(t_{k}, \tau) L(\tau) \underbrace{\mathbb{E}\left[w(\tau) w(\alpha)^{T}\right]}_{Q_{c}^{\prime} \delta(\tau-\alpha)} L(\alpha)^{T} \Phi(t_{k}, \alpha)^T d \tau d \alpha \\ &=\Phi_{k-1} P_{k-1} \Phi_{k-1}^{T}+\int_{t_{k-1}}^{t_{k}} \Phi(t_{k}, \tau) L(\tau) Q_{c}^{\prime} L(\tau)^{T} \Phi(t_{k}, \tau)^T d \tau \\ & \triangleq \Phi_{k-1} P_{k-1} \Phi_{k-1}^{T}+Q_{k-1} \end{aligned} } \] where \(Q_{k-1}\) can be approximated by \[ \displaylines{Q_{k-1} \approx L(t_{k-1}) Q_{c}^{\prime}(t_{k-1}) L(t_{k-1})^T \Delta t } \]

Linear-Optimal Estimation

Combine measurement updates and uncertainty propagation to optimally estimate the state. Consider a discrete LTV model \[ \displaylines{\begin{array}{l} x_{k}=\Phi_{k-1} x_{k-1}+\Gamma_{k-1} u_{k-1}+\Lambda_{k-1} w_{k-1} \\ z_{k}=H_{k} x_{k}+v_{k} \end{array} } \] Estimating the state \(x_k\) is performed by iterating between uncertainty propagation and measurement updates. The mean and covariance after an uncertainty propagation from step \(k-1\) to step \(k\) are denoted by \((x_{k|k-1}, P_{k|k-1})\). The mean and covariance after a measurement update at step \(k\) are denoted by \((x_{k|k}, P_{k|k})\) Kalman filter:

Prediction: \[ \displaylines{\begin{array}{l} \hat{x}_{k | k-1}=\Phi_{k-1} \hat{x}_{k-1 | k-1}+\Gamma_{k-1} u_{k-1} \\ P_{k | k-1}=\Phi_{k-1} P_{k-1 | k-1} \Phi_{k-1}^{T}+Q_{k-1} \end{array} } \] Correction: \[ \displaylines{\begin{array}{l} \hat{x}_{k | k}=\hat{x}_{k | k-1}+K_{k}\left(z_{k}-H_{k} \hat{x}_{k | k-1}\right) \\ P_{k | k}=P_{k | k-1}-P_{k | k-1} H_{k}^{T}\left(H_{k} P_{k | k-1} H_{k}^{T}+R_{k}\right)^{-1} H_{k} P_{k | k-1} \\ K_{k}=P_{k | k} H_{k}^{T} R_{k}^{-1} \end{array} } \]

Stochastic Control

\[ \displaylines{\begin{array}{ll} \text{minimize} &J(u(\cdot))=\mathbb{E}\left[\phi\left(x(t_{f}), t_{f}\right)+\int_{t_{0}}^{t_{f}} \mathcal{L}(x(\tau), u(\tau), \tau) d \tau\right] \\ \text{subject to} & \dot{x}(t) = f(x(t), u(t), w(t), t) \end{array} } \]

where \(\mathbb{E}[w(t)]=0\) and \(\mathbb{E}[w(t)w(\tau)^T] = W(t) \delta(t-\tau)\) for \(W(t) \in \mathbb{S}_+\)

Perfect Measurements

Consider \[ \displaylines{\dot{x}(t)=f(x(t), u(t))+L(t) w(t) } \] The value function is defined as \[ \displaylines{\begin{aligned} V(x(t), t)&=\min _{u(t), t \in[t, t_{f}]} \mathbb{E}\left[\phi\left(x(t_{f}), t_{f}\right)+\int_{t}^{t_{f}} \mathcal{L}(x(\tau), u(\tau), \tau) d \tau\right] \\ &= \min_{u(t), t \in[t, t+\Delta t]} \mathbb{E}\left[\int_{t}^{t+\Delta t} \mathcal{L}(x(\tau), u(\tau), \tau) d \tau + V(x(t+\Delta t), t+\Delta t)\right] \end{aligned} } \] Expand \(V(x,t)\) to first order in \(\Delta t\) according to \[ \displaylines{\begin{aligned} V(x+\Delta x, t+\Delta t) &=V(x, t)+\partial_{t} V(x, t) \Delta t+\nabla_{x} V(x, t)^{T} \Delta x+\frac{1}{2} \Delta x^{T} \nabla_{x}^{2} V(x, t) \Delta x+o(\Delta t) \\ &=V+\partial_{t} V \Delta t+\nabla_{x} V^{T}(f+L w) \Delta t+\frac{1}{2}(f+L w)^{T} \nabla_{x}^{2} V(f+L w) \Delta t^{2}+o(\Delta t) \end{aligned} } \] Note that in the above it is necessary to expand \(V\) to second order in \(\Delta x\) since \(w\) is of order \(1/\sqrt{\Delta t}\)

Substituting back to get the stochastic HJB equation \[ \displaylines{-\partial_t V(x, t) =\min_{u(t)}\left\{\mathcal{L}(x, u, t)+\nabla_{x} V(x, t)^{T} f(x, u)+\frac{1}{2} \operatorname{tr}\left[\nabla_{x}^{2} V(x, t) L(t) W(t) L(t)^{T}\right]\right\} } \]

Continuous Linear-Quadratic Systems

\[ \displaylines{\dot{x} = Fx + Gu + Lw \\ J = \frac{1}{2} \mathbb{E}\left\{x^T(t_f) S_f x(t_f) + \int_{t_0}^{t_f} \begin{bmatrix} x(t) \\ u(t) \end{bmatrix}^T \begin{bmatrix} Q(t) & M(t) \\ M(t)^T & R(t) \end{bmatrix} \begin{bmatrix} x(t) \\ u(t) \end{bmatrix} \right\} } \]

with \(x(0)=x_0\) and given \(t_0\) and \(t_f\).

To solve the stochastic HJB, consider the value function of the form \[ \displaylines{V(t)=\frac{1}{2} x^{T}(t) S(t) x(t)+v(t) } \] where \(v(t)\) is the stochastic value function increment defined by \[ \displaylines{v(t)=\frac{1}{2} \int_{t}^{t_{f}} \operatorname{tr}\left[S(\tau) L W L^{T}\right] d \tau } \] Plug into the stochastic HJB equation and solve for \(u\), and substitute back to get the following equations: \[ \displaylines{-\dot{S}=\left(F-G R^{-1} M^{T}\right)^{T} S+S\left(F-G R^{-1} M^{T}\right)+Q-S G R^{-1} G^{T} S-M R^{-1} M^{T} \\ -\dot{v} = \frac{1}{2}\operatorname{tr}(SLWL^T) } \]

Discrete Linear-Quadratic Systems

\[ \displaylines{x_{k+1} = F_k x_k + G_k u_k + L_k w_k \\ J = \frac{1}{2} \mathbb{E}\left\{x_N^T S_N x_N + \sum_{k=0}^{N-1} \begin{bmatrix} x_k \\ u_k \end{bmatrix}^T \begin{bmatrix} Q & M \\ M^T & R \end{bmatrix} \begin{bmatrix} x_k \\ u_k \end{bmatrix} \right\} } \]

with \(x(0)=x_0\) and given \(t_0\) and \(t_f\).

Similarly use value function of the form \[ \displaylines{V_k = \frac{1}{2}x_k S_k x_k + v_k \\ v_k = \frac{1}{2} \operatorname{tr}[S_{k+1} L W_k L^T] + v_{k+1} } \] Plug into the stochastic HJB equation we can get \[ \displaylines{S_{k}=\left(F^{T} S_{k+1} F+Q\right)-\left(M^{T}+G^{T} S_{k+1} F\right)^{T}\left(R+G^{T} S_{k+1} G\right)^{-1}\left(M^{T}+G^{T} S_{k+1} F\right) } \]