Control¶

Steady-state LQR action selection: the action-side dual of the Kalman filter. The Agent builds one of these for you when you give it a goal; you rarely touch it directly.

Internal — not part of the public API

LQRController is not exported from cpomdp and carries no stability promise; the Agent constructs it for you. It's documented here for the architecture it illustrates — LQR as the fixed-sensor reduction of active inference (ADR-003). Build agents with StateGoal, not this directly.

LQRController ¶

LQRController(
    model: LinearGaussianModel,
    *,
    goal_precision: ArrayLike,
    effort_penalty: ArrayLike,
    tol: float = 1e-12,
    max_iter: int = 1000,
)

Steady-state LQR action selection — the action-side dual of the filter.

Where the Kalman filter front-loads perception (solve the estimation Riccati once for the steady-state gain K∞, then mean += K∞·prediction_error), this front-loads action: solve the dual control Riccati once for L∞, then action = -L∞·(mean − goal). Both gains are data-independent, both are computed at construction, and together they are LQG (see RESEARCH.md).

The load-bearing claim (ADR-003) is that LQR is active inference here, not a substitute for it. For a fixed linear-Gaussian sensor the covariance recursion is control-independent, so Expected Free Energy's epistemic term is identical for every action and drops out of the argmin; EFE-minimising selection reduces to its pragmatic term, and the pragmatic term under a Gaussian preference is a quadratic cost whose optimum is exactly LQR. The epistemic term only re-enters once sensing depends on the state or action — out of scope for v0.1.

The two cost matrices are named for the preference they encode, not by LQR's traditional Q/R — those letters already mean the noise covariances on the model (dynamics_noise/sensor_noise), the exact collision ADR-003 warns about. The names are the same across the whole library: an Agent hands these straight through to its controller.

Parameters:

Name	Type	Description	Default
`model`	`LinearGaussianModel`	The linear-Gaussian model to act in. Must carry a `control` matrix — there is nothing to act with otherwise.	required
`goal_precision`	`ArrayLike`	How sharply the agent prefers the goal, an `(n, n)` matrix. It is exactly the precision of the Gaussian preference centred at the goal — `exp(−½(state−goal)ᵀ·goal_precision·(state−goal))` — so heavier `goal_precision` buys a more aggressive controller. (LQR's `Q`.)	required
`effort_penalty`	`ArrayLike`	How much action costs, a `(p, p)` matrix. Heavier `effort_penalty` buys a gentler controller. (LQR's `R`.)	required
`tol`	`float`	Absolute tolerance on successive cost-to-go iterates; convergence is declared when they stop moving by more than this.	`1e-12`
`max_iter`	`int`	Iteration cap before the Riccati recursion is declared to have failed to converge.	`1000`

Raises:

Type	Description
`ValueError`	If the model has no `control` matrix, or a cost matrix does not match the state/action dimensions, is not symmetric, or fails its definiteness requirement (`goal_precision` PSD, `effort_penalty` PD).
`RuntimeError`	If the control Riccati does not converge within `max_iter` — typically because `(dynamics, control)` is not stabilisable.

Source code in src/cpomdp/control.py

def __init__(
    self,
    model: LinearGaussianModel,
    *,
    goal_precision: ArrayLike,
    effort_penalty: ArrayLike,
    tol: float = 1e-12,
    max_iter: int = 1000,
) -> None:
    if model.control is None:
        raise ValueError(
            "LQR needs an action channel: the model has no control matrix, "
            "so there is nothing to act with."
        )
    self.model = model
    self._goal_precision = jnp.asarray(goal_precision, dtype=float)
    self._effort_penalty = jnp.asarray(effort_penalty, dtype=float)

    n, p = model.n_states, model.n_controls
    if self._goal_precision.shape != (n, n):
        raise ValueError(
            f"goal_precision must be {n}x{n} to match the {n}-D state, "
            f"got shape {self._goal_precision.shape}"
        )
    if self._effort_penalty.shape != (p, p):
        raise ValueError(
            f"effort_penalty must be {p}x{p} to match the {p}-D action, "
            f"got shape {self._effort_penalty.shape}"
        )
    _validate_cost(self._goal_precision, "goal_precision", require_definite=False)
    _validate_cost(self._effort_penalty, "effort_penalty", require_definite=True)

    self._gain = self._converge_to_steady_state(tol, max_iter)

gain `property` ¶

gain: Float64[Array, 'p n']

The steady-state feedback gain L∞, shape (p, n).

action ¶

action(
    mean: ArrayLike, goal: ArrayLike
) -> Float64[Array, p]

The action that drives the estimated state toward goal.

One matrix-vector product, -L∞·(mean − goal) — all the work was front-loaded into L∞ at construction, so there is no optimisation in the loop. The mean − goal shift turns the regulator (which drives its state to zero) into a controller that drives the state to goal.