Contents
 Notation
 Discretetime TimeVarying FiniteHorizon Linear System
 LQR
 Deriving the Optimal LQR Controller
 Summary
 Discussion
 References
Notation
Let \(\Sym^n\) denote the set of symmetric matrices in \(\R^{n \times n}\). Also, superscripts are interpreted after subscripts: \(M_x^\top\) denotes \((M_x)^\top\), and \(M_x^{1}\) denotes \((M_x)^{1}\).
Discretetime TimeVarying FiniteHorizon Linear System
Consider a discretetime timevarying finitehorizon linear system:
 finitehorizon: \(t = 1, \dotsc, T\)
 states: \(s_t \in \R^n\)
 actions: \(a_t \in \R^m\)
 random noise: \(w_t \in \R^n\), \(\E[w_t] = 0\), each \(w_t\) is independent
 The only assumption on the random vector \(w_t\) is that it has zero mean. No other assumptions are made about its distribution.
 \(w_t\) and \(w_k\) need not be drawn from the same distribution, for \(t \neq k\).
 dynamics: \(s_{t+1} = A_t s_t + B_t a_t + w_t\) for some given matrices \(A_t \in \R^{n \times n}\) and \(B_t \in \R^{n \times m}\)
 An equivalent formulation is \(s_{t+1} = F_t x_t + w_t\) where \(F_t = \begin{bmatrix} A_t & B_t \end{bmatrix}\) and \(x_t = \sa[t]\).
 instantaneous cost function \(c_t(s,a): \R^n \times \R^m \to \R\)
 The cost functions \(c_t\) for \(t = 1, \dotsc, T1\) are known as the stage cost.
 The cost \(c_T\) at the final time step is called the terminal cost.
The overall goal, given a starting state \(s_0\), is to find a sequence of actions \(a_t\) to minimize the total expected cost
\[J = \sum_{t=0}^T \E_{w_t} \left[ c_t(s_t, a_t) \right].\]For each time step \(t\), the \(q\)function and value function are defined as
\[\begin{aligned} q_t(s,a) &= \begin{cases} c_T(s,a) & t = T \\ \begin{aligned} & c_t(s,a) + \E_{w_t} \left[ \min_{a_{t+1}} q_{t+1}(s_{t+1}, a_{t+1}) \right] \\ & = c_t(s,a) + \E_{w_t} \left[ v_{t+1}(s_{t+1}) \right] \end{aligned} & t < T \\ \end{cases} \\ v_t(s) &= \min_a q_t(s,a) \end{aligned}\]The value function is also known as the optimal costtogo, sometimes denoted as \(J_t^*(s)\).
Thus, the minimum expected cost is
\[\begin{aligned} J^* &:= \min_{a_t:\, t=0,\dotsc,T} \sum_{t=0}^T \E_{w_t} \left[ c_t(s_t, a_t) \right] \\ &= \min_a q_0(s_0, a) \\ &= v_0^*(s_0). \end{aligned}\]LQR
LQR, short for “linear quadratic regulator,” refers to the optimal controller for a linear system with quadratic costs. This post analyzes the discretetime finitehorizon case, although similar results hold for continuoustime systems and infinite time horizons as well.
The standard (canonical) LQR cost function is \(c_t(s,a) = s^\top Q_t s + a^\top R_t a\) for some given matrices \(Q_t \in \Sym^n\) and \(R_t \in \Sym^m\) with \(Q_t \succeq 0\) and \(R_t \succ 0\).
In standard LQR, the minimum instantaneous cost is achieved by \(s, a = 0\). Furthermore, this is a fixed point in the linear system dynamics. Therefore, the optimization objective is essentially to find a series of actions that moves the state closer and closer to the origin.
A more general case is
\[c_t(s,a) = x^\top D_t x + d_t^\top x + \delta_t\]given \(\delta_t \in \R\), \(D_t = \begin{bmatrix} Q_t & S_t^\top \\ S_t & R_t \end{bmatrix} \in \Sym^{n+m}\) where \(D_t, Q_t \succeq 0\) and \(R_t \succ 0\), and \(d_t = \begin{bmatrix} d_{t,s} \\ d_{t,a} \end{bmatrix} \in \R^{n+m}\).
Setting \(S_t = 0\) and \(d_t = 0\) yields the standard LQR cost function.
Here, \(A_t\), \(B_t\), \(D_t\), and \(d_t\) are assumed to be known for all \(t = 0, \dotsc, T\). When these quantities are not known in realworld settings, they can usually be estimated by first collecting some state transitions from an arbitrary policy.
Deriving the Optimal LQR Controller
The derivation of the optimal LQR controller is inductive. It relies on the following lemma, which shows that in a linear system with a quadratic cost function, the optimal action \(a^*\) is a linear function of the state \(s\), and the optimal cost function \(c^*\) is a quadratic function of \(s\).
Lemma 1: Consider the function \(c: \R^n \times \R^m \to \R\) given by
\[c(s,a) = \sat D \sa + d^\top \sa + \delta\]where \(\delta \in \R\), \(D = \begin{bmatrix} Q & S^\top \\ S & R \end{bmatrix} \in \Sym^{n+m}\) with \(D, Q \succeq 0\) and \(R \succ 0\), and \(d = \begin{bmatrix} d_s \\ d_a \end{bmatrix} \in \R^{n+m}\). Then \(c(s,a)\) is minimized by setting
\[a^* = Ks + k\]where \(K = R^{1} S\) and \(k = \frac{1}{2} R^{1} d_a\). Plugging \(a^*\) into the cost function yields the optimal cost function
\[c^*(s) :=c(s, a^*) = s^\top Z s + z^\top s + \text{const}\]where \(Z = Q + S^\top K\) is positive semidefinite, and \(z = 2 S^\top k + d_s\). The term \(\text{const} = \delta  \frac{1}{4} d_a^\top R^{1} d_a\) is constant with respect to the state \(s\) and action \(a\).
Proof
\[\begin{aligned} c(s,a) &= \sat D \sa + d^\top \sa + \delta \\ &= s^\top Q s + s^\top S^\top a + a^\top S s + a^\top R a + d_s^\top s + d_a^\top a + \delta \\ &= s^\top Q s + a^\top (S + S) s + a^\top R a + d_s^\top s + d_a^\top a + \delta \\ &= s^\top Q s + 2 a^\top S s + a^\top R a + d_s^\top s + d_a^\top a + \delta \\ \nabla_a c(s,a) &= 2 S s + 2R a + d_a \\ \nabla_a^2 c(s,a) &= 2 R \end{aligned}\]Since \(R \succ 0\), \(c(s,a)\) is strictly convex in \(a\) and is minimized by the unique value \(a^*\) at which the gradient is 0. Setting \(\nabla_a c(s,a) = 0\) and solving for \(a\) yields
\[a^* = R^{1} S s  \frac{1}{2} R^{1} d_a = Ks + k\]where \(K = R^{1} S\) and \(k = \frac{1}{2} R^{1} d_a\). Plugging this into the cost function yields
\[\begin{aligned} c^*(s) &:= c(s, a^*) \\ &= s^\top Q s + 2 (Ks+k)^\top S s + (Ks+k)^\top R (Ks+k) + d_s^\top s + d_a^\top (Ks+k) + \delta \\ &= s^\top (Q + 2 K^\top S + K^\top R K) s + (2 S^\top k + 2 K^\top R k + K^\top d_a + d_s)^\top s + k^\top R k + d_a^\top k + \delta \\ &= s^\top Z s + z^\top s + \underbrace{k^\top R k + d_a^\top k + \delta}_{\text{constant w.r.t. $s$ and $a$}} \end{aligned}\]where
\[\begin{aligned} Z &= Q + 2 K^\top S + K^\top R K \\ &= Q + 2 K^\top S  K^\top R (R^{1} S) \\ &= Q + 2 K^\top S  K^\top S \\ &= Q + K^\top S = Q  S^\top R^{1} S \\ z &= 2 S^\top k + 2 K^\top R k + K^\top d_a + d_s \\ &= 2 S^\top k + (R^{1} S)^\top R R^{1} d_a  (R^{1} S)^\top d_a + d_s \\ &= 2 S^\top k + S^\top R^{1} d_a  S^\top R^{1} d_a + d_s \\ &= 2 S^\top k + d_s. \end{aligned}\]Note that \(Z \in \Sym^n\) is symmetric because \(Q\) and \(R^{1}\) are both symmetric. Furthermore, \(Z \succeq 0\) because \(Z\) is the Schur complement of \(R\) with \(Q \succeq 0\) and \(R \succ 0\). (For more on Schur complements, see this blog post.)
The constant term can be simplified as
\[\begin{aligned} \text{const} &= \delta + k^\top R k + d_a^\top k \\ &= \delta + \frac{1}{4} (R^{1} d_a)^\top R R^{1} d_a  \frac{1}{2} d_a^\top R^{1} d_a \\ &= \delta + \frac{1}{4} d_a^\top R^{1} d_a  \frac{1}{2} d_a^\top R^{1} d_a \\ &= \delta  \frac{1}{4} d_a^\top R^{1} d_a. \end{aligned}\]Inductive Step
The inductive derivation of the LQR works backwards from the final time step \(t=T\) down to \(t=0\). Consider the inductive hypothesis that \(q_t\) has the form
\[q_t(s,a) = x^\top P_t x + p_t^\top x + \lambda_t\]for some vector \(p_t\) and symmetric matrix \(P_t = \begin{bmatrix} P_{t,s} & P_{t,as}^\top \\ P_{t,as} & P_{t,a} \end{bmatrix}\) with \(P_t, P_{t,s} \succeq 0\) and \(P_{t,a} \succ 0\). The scalar \(\lambda_t\) is constant with respect to the state \(s\) and action \(a\).
Notation: For the rest of this section, the subscript \(t\) is dropped from variables to reduce notational clutter. Every variable that is not written with a subscript is understood to have a subscript \(t\). Variables with subscripts other than \(t\) are written with the appropriate subscript.
The base case \(q_T(s,a) = c_T(s,a) = x^\top D_T x + d_T^\top x + \delta_T\) clearly satisfies the inductive hypothesis with \(P_T = D_T\), \(p_T = d_T\), and \(\lambda_T = \delta_T\).
For the inductive step, suppose that the hypothesis holds for some time step \(t+1\). By Lemma 1, the optimal value function at time \(t+1\) is
\[v_{t+1}(s) = s^\top Z_{t+1} s + z_{t+1}^\top s + \text{const}\]where \(Z_{t+1} \succeq 0\) and \(z_{t+1}\) are functions of \(P_{t+1}\) and \(p_{t+1}\). The constant term is \(\text{const} = \lambda_{t+1}  \frac{1}{4} d_{t+1,a}^\top (P_{t+1,a})^{1} d_{t+1,a}\).
If \(s_{t+1} = A s + B a + w = F x + w\), then
\[\begin{aligned} v_{t+1}(s_{t+1}) &= (F x + w)^\top Z_{t+1} (F x + w) + z_{t+1}^\top (F x + w) + \text{const} \\ &= x^\top F^\top Z_{t+1} F x + 2 w^\top Z_{t+1} F x + w^\top Z_{t+1} w + z_{t+1}^\top F x + w^\top z_{t+1} + \text{const} \\ \E_w\left[ v_{t+1}(s') \right] &= x^\top F^\top Z_{t+1} F x + \E_w[ w^\top Z_{t+1} w ] + z_{t+1}^\top F x + \text{const} \\ &= x^\top F^\top Z_{t+1} F x + z_{t+1}^\top F x + \text{const}_w \\ q_t(s,a) &= c_t(s,a) + \E_w \left[ \min_{a'} q_{t+1}(s', a') \right] \\ &= x^\top D x + d^\top x + \delta + \E_w \left[ v_{t+1}(s') \right] \\ &= x^\top (D + F^\top Z_{t+1} F) x + (d + F^\top z_{t+1})^\top x + \delta + \text{const}_w \\ &= x^\top P x + p^\top x + \lambda \end{aligned}\]where \(P = D + F^\top Z_{t+1} F\) and \(p = d + F^\top z_{t+1}\). The first equation for \(\E_w[v_{t+1}(s')]\) uses the assumption that \(\E[w] = 0\); the second equality absorbs \(\E_w[w^\top Z_{t+1} w]\) into \(\text{const}_w\) because it is constant with respect to states and actions. The constant term is therefore
\[\lambda = \delta + \text{const}_w = \delta + \E_w[ w^\top Z_{t+1} w ] + \lambda_{t+1}  \frac{1}{4} d_{t+1,a}^\top (P_{t+1,a})^{1} d_{t+1,a}.\]Since \(D, Z_{t+1} \succeq 0\), it follows that \(P \succeq 0\). The block matrix form of \(P\) is
\[\begin{aligned} P &= D + F^\top Z_{t+1} F \\ &= \begin{bmatrix} Q + A^\top Z_{t+1} A & S^\top + A^\top Z_{t+1} B \\ S + B^\top Z_{t+1} A & R + B^\top Z_{t+1} B \end{bmatrix} \\ &= \begin{bmatrix} P_s & P_{as}^\top \\ P_{as} & P_a \end{bmatrix} \end{aligned}\]which implies \(P_a = R + B^\top Z_{t+1} B \succ 0\) because \(R \succ 0\) and \(Z_{t+1} \succeq 0\).
This shows that \(q_t(s,a)\) satisfies the inductive hypothesis, which gives a backwards recursion:

For the final time step \(t = T\):
\[\begin{aligned} P_T &= D_T & p_T &= d_T \\ K_T &= R_T^{1} S_T, & k_T &= \frac{1}{2} R_T^{1} d_{T,a} \\ Z_T &= Q_T + S_T^\top K_T, & z_T &= 2 S_T^\top k_T + d_{T,s} \end{aligned}\] 
For \(t = T1, \dotsc, 0\):
\[\begin{aligned} P &= D + F^\top Z_{t+1} F, & p &= d + F^\top z_{t+1} \\ K &= P_a^{1} P_{as}, & k &= \frac{1}{2} P_a^{1} p_a \\ Z &= P_s + P_{as}^\top K, & z &= 2 P_{as}^\top k + p_s \end{aligned}\]
This can be simplified further by defining \(Z_{T+1} = \zero_{n \times n}\) and \(z_{T+1} = \zero_n\). Then the equations above hold for all \(t = T, \dotsc, 0\).
Additionally, it is common to expand out the recursion equations (which removes the need for explicitly constructing \(P\) and \(p\)). In particular, the expression for \(Z\) is known as the discretetime Ricatti equation.
\[\begin{aligned} K &= (R + B^\top Z_{t+1} B)^{1} (S + B^\top Z_{t+1} A) \\ k &= \frac{1}{2} (R + B^\top Z_{t+1} B)^{1} (d_a + B^\top z_{t+1}) \\ Z &= Q + A^\top Z_{t+1} A  (S + B^\top Z_{t+1} A)^\top (R + B^\top Z_{t+1} B)^{1} (S + B^\top Z_{t+1} A) \\ z &= (S + B^\top Z_{t+1} A)^\top (R + B^\top Z_{t+1} B)^{1} (d_a + B^\top z_{t+1}) + d_s + A^\top z_{t+1} \end{aligned}\]Finally, to solve for actions \(a_t\), apply forward recursion. For \(t = 0, \dotsc, T\):
\[\begin{aligned} a_t &= K_t s_t + k_t \\ s_{t+1} &= A_t s_t + B_t a_t + w_t \end{aligned}\]The optimal controller defined by the matrices \(K_t\) and vectors \(k_t\) is known as the linear quadratic regulator (LQR). Note that the optimal controller \((K_t, k_t)\) does not depend on the actual observed state \(s_t\). As long as the future noise variables \(w_t\) are independently distributed with mean 0, then the sequence of controllers \(\{(K_t, k_t)\}_{t=0}^T\) is always optimal.
Standard LQR
Consider the standard LQR setting where \(S = 0\), \(d = 0\), and \(\delta = 0\), so the cost function at each step is simply
\[c(s,a) = s^\top Q s + a^\top R a.\](As in the previous section, the subscript \(t\) is dropped to reduce notational clutter. All variables that are written without a subscript are understood to implicitly have a subscript \(t\).)
For the inductive hypothesis, suppose that \(z_{t+1} = 0\). The base case is satisfied with \(z_{T+1} = 0\). Then the recursion vectors all become 0:
\[\begin{aligned} p &= d + F^\top z_{t+1} = 0 + F^\top 0 = 0 \\ k &= \frac{1}{2} P_a^{1} p_a = \frac{1}{2} P_a^{1} 0 = 0 \\ z &= 2 P_{as}^\top k + p_s = 2 P_{as}^\top 0 + 0 = 0 \end{aligned}\]The recursion matrices simplify to
\[\begin{aligned} K &= (R + B^\top Z_{t+1} B)^{1} B^\top Z_{t+1} A \\ Z &= Q + A^\top Z_{t+1} A  (B^\top Z_{t+1} A)^\top (R + B^\top Z_{t+1} B)^{1} B^\top Z_{t+1} A \end{aligned}\]and the optimal action from any state \(s_t\) at time \(t\) is \(a_t = K_t s_t\). In this case, the linear quadratic regulator is the set of matrices \(\{K_t\}_{t=0}^T\).
Furthermore, if we assume \(\E[w w^\top] = \Sigma\) (for example, \(w \sim \mathcal{N}(0, \Sigma)\)), then
\[\E_w\left[ w^\top Z_{t+1} w \right] = \E_w\left[ \tr( w w^\top Z_{t+1} ) \right] = \tr( \E_w[ w w^\top ] Z_{t+1} ) = \tr(\Sigma Z_{t+1})\]so \(\lambda = \tr(\Sigma Z_{t+1}) + \lambda_{t+1}\). Applying Lemma 1 to \(q_t(s,a)\), the value function at each time step is precisely
\[v_t(s) = s^\top Z s + \tr(\Sigma Z_{t+1}) + \lambda_{t+1},\]with base case \(\lambda_T = 0\) and \(v_T(s) = c_T^*(s) = s^\top Q_T s\).
If there is no randomness in the state transitions (i.e., \(w_t = 0\) always), then \(\E_w\left[ w^\top Z_{t+1} w \right] = 0\) so \(v_t(s) = s^\top Z s\).
NonSymmetric Cost Matrices
In the setup above, it was assumed that \(D_t, Q_t \succeq 0\) and \(R_t \succ 0\). However, this requirement can in fact be generalized a bit. By Lemma 2 below, it suffices that
\[D_t = \begin{bmatrix} Q_t & D_{sa} \\ D_{as} & R_t \end{bmatrix}\]satisfy \(D_t + D_t^\top \succeq 0\), \(Q_t + Q_t^\top \succeq 0\), and \(R_t + R_t^\top \succ 0\).
Lemma 2: For any square matrix \(M \in \R^{n \times n}\) and vector \(x \in \R^n\),
\[x^\top M x = x^\top \left[ \frac{1}{2} (M + M^\top) \right] x.\]In other words, \(x^\top M x\) can always be replaced with an equivalent expression \(x^\top \hat{M} x\) where \(\hat{M} = \frac{1}{2} (M + M^\top)\) is symmetric.
Proof
Note that
\[x^\top M x = (x^\top M x)^\top = x^\top (x^\top M)^\top = x^\top M^\top x.\]Therefore,
\[x^\top \left[ \frac{1}{2} (M + M^\top) \right] x = \frac{1}{2} (x^\top M x + x^\top M^\top x) = x^\top M x.\]Summary
Consider a discretetime finitehorizon linear dynamical system
\[\begin{aligned} s_{t+1} &= A_t s_t + B_t a_t + w_t \\ \E[w_t] &= 0 \end{aligned}\]where the goal is to choose actions \(a_t\) (\(t=0, \dotsc, T\)) that minimize quadratic costs
\[c_t(s,a) = \sat D_t \sa + d_t^\top \sa + \text{const}.\]Define the \(q\)functions and value functions
\[\begin{aligned} q_t(s,a) &= \begin{cases} c_T(s,a) & t = T \\ \begin{aligned} & c_t(s,a) + \E_{w_t} \left[ \min_{a_{t+1}} q_{t+1}(s_{t+1}, a_{t+1}) \right] \\ & = c_t(s,a) + \E_{w_t} \left[ v_{t+1}(s_{t+1}) \right] \end{aligned} & t < T \\ \end{cases} \\ v_t(s) &= \min_a q_t(s,a). \end{aligned}\]In the inductive step, I showed that if
\[q_t(s,a) = \sat P_t \sa + p_t^\top \sa + \lambda_t\]for some matrix \(P_t\), vector \(p_t\), and scalar \(\lambda_t\), then
\[v_t(s) = s^\top Z_t s + z_t^\top s + \text{const}.\]In addition, \(q_{t1}\) and \(v_{t1}\) have the same form as \(q_t\) and \(v_t\), and I derived recursive formulas for \(P_t, p_t, Z_t, z_t\) in terms of the given cost matrices for all \(t = T, \dotsc, 0\). In particular, the equation for \(Z_t\) is known as the Ricatti equation.
Finally, given \(P_t, p_t\), I showed how to compute \(K_t, k_t\) such that the optimal action from any state \(s_t\) is
\[a_t = K_t s_t + k_t.\]The set of controllers \(\{(K_t, k_t)\}_{t=0}^T\) form the linear quadratic regulator (LQR). Importantly, the LQR is optimal from any state \(s\) and does not depend on the noise distribution for the \(w_t\). The optimal policy is always the same, as long as \(w_t\) has zero mean!
Discussion
Linear Dynamics with Constant
In some settings, the system dynamics are expressed as
\[s_{t+1} = A s_t + B a_t + C\]with an additional constant vector \(C \in \R^n\). This can actually be rewritten as a standard linear dynamical system as
\[\begin{bmatrix} s_{t+1} \\ 1 \end{bmatrix} = \begin{bmatrix} A & C \\ \zero_n^\top & 1 \end{bmatrix} \begin{bmatrix} s_t \\ 1 \end{bmatrix} + \begin{bmatrix} B \\ \zero_m^\top \end{bmatrix} a_t.\]This reformulation is commonly used when linearizing nonlinear dynamical systems.
ClosedLoop vs. OpenLoop
This post presented the discretetime finitehorizon unconstrained LQR problem and provided a recursive (a.k.a. dynamic programming) approach to finding the optimal controller. Because the recursive controller
\[a_t = K_t s_t + k_t\]depends on the actual state \(s_t\) to decide the next action \(a_t\), this controller is a “closedloop” controller; the state influences the next action, creating a feedback loop.
However, it is also possible to formulate this unconstrained LQR problem as oneshot quadratic optimization problem, sometimes known as the “batch” or “leastsquares” approach. This results in an “openloop” controller, because the control actions \(a_t\) for all time steps are treated as optimization variables and solved for simultaneously.
Sources
 For the “batch” / “leastsquares” formulation of the unconstrained LQR problem, see slides from either Stanford EE363 (200809), Lecture 1 or Caltech CS159 (202021), Lecture 2.
Constrained LQR
The setting presented above assumes that all states \(s_t \in \R^n\) and \(a_t \in \R^m\) are allowed. However, many realworld control problems have constraints on the states and/or actions:
\[\forall t:\ s_t \in \mathcal{S},\ a_t \in \mathcal{A}.\]With state and input constraints, Lemma 1 no longer applies, so the recursion equations no longer hold.
If the state and input constraints are polytopes, then the constrained LQR problem can still be formulated as a quadratic optimization problem. However, the dynamic programming approach is more involved.
Sources
 See the Caltech CS159 (202021), Lecture 2 slides for more information on the quadratic problem formulation.
 See Borelli et al. (2016) textbook (Chapters 11.3.3 and 17.6) for more details on the dynamic programming approach.
Other Related Topics
Linearization and Differential Dynamic Programming (DDP)
Let \(s_{t+1} = f_t(s,a)\) express the system dynamics. LQR works when \(f\) is a linear function. If \(f\) is a differentiable and continuous nonlinear function, it can be approximated by a Taylor series expansion. Linearization requires choosing a point \((s,a)\) around which to linearize \(f_t\).
In some problems, if the initial state and a desired ending state are close, then choosing either of those points (or something in between) may suffice. For example, in the standard LQR setting, the desired end state is typically the origin, and often the objective is to simply keep the system centered around the origin.
In general, though, it may be unclear what point to choose for linearization. One strategy is simply to choose random points \((s_t^{(0)}, a_t^{(0)})\) around which to linearize each \(f_t\). After solving the LQR problem, then the optimal points \((s_t^{(1)}, a_t^{(1)})\) from the LQR solution can be used as the new points for linearization. This process can be repeated until some stopping criterion (e.g., a maximum number of iterations, or convergence).
However, linearization generally does not result in cost nor stability guarantees.
Linear Quadratic Gaussian (LQG)
LQR can be further generalized to the partially observable Markov decision process (POMDP) setting. Instead of observing the states \(s_t\), suppose that we only observe a random vector \(y_t\) whose conditional distribution depends on \(s_t\). Specifically, the problem of minimizing a quadratic cost function
\[J = \E\left[ \sum_{t=0}^T x_t^\top Q_t x_t + a_t^\top R_t a_t \right]\]with the system described by
\[\begin{aligned} s_{t+1} &= A_t s_t + B_t a_t + w_t & w_t &\sim \mathcal{N}(0, W_t) \\ y_t &= C_t s_t + v_t & v_t &\sim \mathcal{N}(0, V_t) \end{aligned}\]is called the linearquadraticGaussian (LQG) control problem. In this problem, only the \(y_t\) variables are observed, even though the underlying system dynamics and cost function are in terms of the unobserved state vectors \(s_t\).
As with vanilla LQR, there exists a recursive derivation of the optimal LQG controller, not too different from the standard LQR controller.
Continuous and infinitehorizon
Even though I only provided derivation for the finitehorizon discretetime setting, the results are similar for continuoustime and infinitehorizon settings. Essentially, the sum in the cost function is replaced by an integral, and the discretetime algebraic Ricatti equation is replaced by a continuoustime Ricatti differential equation.
References
 UC Berkeley CS285 (2020), Lecture 10
 the original slides (sadly riddled with many typos) which inspired this blog post
 presents the general LQR setting without noise
 Stanford CS229 (2018) Lecture Notes
 covers standard LQR with Gaussian noise, linearization of dynamics, DDP, and a simplified version of the linear quadratic Gaussian problem
 Stanford EE363 (200809) Lecture Notes
 Lecture 1 presents dynamic programming and leastsquares solutions to the standard LQR problem without noise
 Lecture 5 presents the standard LQR problem with Gaussian noise
 Caltech CS159 (202021), Lecture 2
 presents the standard LQR setting without noise
 includes batch approach to solving constrained LQR and discussion on linearization of nonlinear systems
 Predictive Control for Linear and Hybrid Systems by Borelli et al. (2016). Available here.
 allaround good reference
 covers the standard LQR setting with and without noise and goes indepth into both batch and recursive controllers for constrained and nonlinear problems