Skip to content

Control

Steady-state LQR action selection: the action-side dual of the Kalman filter. The Agent builds one of these for you when you give it a goal; you rarely touch it directly.

Internal — not part of the public API

LQRController is not exported from cpomdp and carries no stability promise; the Agent constructs it for you. It's documented here for the architecture it illustrates — LQR as the fixed-sensor reduction of active inference (ADR-003). Build agents with StateGoal, not this directly.

LQRController

LQRController(
    model: LinearGaussianModel,
    *,
    goal_precision: ArrayLike,
    effort_penalty: ArrayLike,
    tol: float = 1e-12,
    max_iter: int = 1000,
)

Steady-state LQR action selection — the action-side dual of the filter.

Where the Kalman filter front-loads perception (solve the estimation Riccati once for the steady-state gain K∞, then mean += K∞·prediction_error), this front-loads action: solve the dual control Riccati once for L∞, then action = -L∞·(mean − goal). Both gains are data-independent, both are computed at construction, and together they are LQG (see RESEARCH.md).

The load-bearing claim (ADR-003) is that LQR is active inference here, not a substitute for it. For a fixed linear-Gaussian sensor the covariance recursion is control-independent, so Expected Free Energy's epistemic term is identical for every action and drops out of the argmin; EFE-minimising selection reduces to its pragmatic term, and the pragmatic term under a Gaussian preference is a quadratic cost whose optimum is exactly LQR. The epistemic term only re-enters once sensing depends on the state or action — out of scope for v0.1.

The two cost matrices are named for the preference they encode, not by LQR's traditional Q/R — those letters already mean the noise covariances on the model (dynamics_noise/sensor_noise), the exact collision ADR-003 warns about. The names are the same across the whole library: an Agent hands these straight through to its controller.

Parameters:

Name Type Description Default
model LinearGaussianModel

The linear-Gaussian model to act in. Must carry a control matrix — there is nothing to act with otherwise.

required
goal_precision ArrayLike

How sharply the agent prefers the goal, an (n, n) matrix. It is exactly the precision of the Gaussian preference centred at the goal — exp(−½(state−goal)ᵀ·goal_precision·(state−goal)) — so heavier goal_precision buys a more aggressive controller. (LQR's Q.)

required
effort_penalty ArrayLike

How much action costs, a (p, p) matrix. Heavier effort_penalty buys a gentler controller. (LQR's R.)

required
tol float

Absolute tolerance on successive cost-to-go iterates; convergence is declared when they stop moving by more than this.

1e-12
max_iter int

Iteration cap before the Riccati recursion is declared to have failed to converge.

1000

Raises:

Type Description
ValueError

If the model has no control matrix, or a cost matrix does not match the state/action dimensions, is not symmetric, or fails its definiteness requirement (goal_precision PSD, effort_penalty PD).

RuntimeError

If the control Riccati does not converge within max_iter — typically because (dynamics, control) is not stabilisable.

Source code in src/cpomdp/control.py
def __init__(
    self,
    model: LinearGaussianModel,
    *,
    goal_precision: ArrayLike,
    effort_penalty: ArrayLike,
    tol: float = 1e-12,
    max_iter: int = 1000,
) -> None:
    if model.control is None:
        raise ValueError(
            "LQR needs an action channel: the model has no control matrix, "
            "so there is nothing to act with."
        )
    self.model = model
    self._goal_precision = jnp.asarray(goal_precision, dtype=float)
    self._effort_penalty = jnp.asarray(effort_penalty, dtype=float)

    n, p = model.n_states, model.n_controls
    if self._goal_precision.shape != (n, n):
        raise ValueError(
            f"goal_precision must be {n}x{n} to match the {n}-D state, "
            f"got shape {self._goal_precision.shape}"
        )
    if self._effort_penalty.shape != (p, p):
        raise ValueError(
            f"effort_penalty must be {p}x{p} to match the {p}-D action, "
            f"got shape {self._effort_penalty.shape}"
        )
    _validate_cost(self._goal_precision, "goal_precision", require_definite=False)
    _validate_cost(self._effort_penalty, "effort_penalty", require_definite=True)

    self._gain = self._converge_to_steady_state(tol, max_iter)

gain property

gain: Float64[Array, 'p n']

The steady-state feedback gain L∞, shape (p, n).

action

action(
    mean: ArrayLike, goal: ArrayLike
) -> Float64[Array, p]

The action that drives the estimated state toward goal.

One matrix-vector product, -L∞·(mean − goal) — all the work was front-loaded into L∞ at construction, so there is no optimisation in the loop. The mean − goal shift turns the regulator (which drives its state to zero) into a controller that drives the state to goal.

Parameters:

Name Type Description Default
mean ArrayLike

The current belief mean — the best estimate of the state, shape (n,).

required
goal ArrayLike

The state to steer toward, shape (n,). It must be an equilibrium the dynamics can hold at zero action; aim at a non-equilibrium and a steady-state offset is left behind.

required

Returns:

Type Description
Float64[Array, p]

The action, shape (p,).

Raises:

Type Description
ValueError

If goal is not a 1-D vector of length n.

Source code in src/cpomdp/control.py
def action(self, mean: ArrayLike, goal: ArrayLike) -> Float64[Array, "p"]:
    """The action that drives the estimated state toward ``goal``.

    One matrix-vector product, ``-L∞·(mean − goal)`` — all the work was
    front-loaded into ``L∞`` at construction, so there is no optimisation in
    the loop. The ``mean − goal`` shift turns the regulator (which drives its
    state to zero) into a controller that drives the state to ``goal``.

    Args:
        mean: The current belief mean — the best estimate of the state,
            shape ``(n,)``.
        goal: The state to steer toward, shape ``(n,)``. It must be an
            equilibrium the dynamics can hold at zero action; aim at a
            non-equilibrium and a steady-state offset is left behind.

    Returns:
        The action, shape ``(p,)``.

    Raises:
        ValueError: If ``goal`` is not a 1-D vector of length ``n``.
    """
    # self._gain : (p, n) L∞;  mean, goal : (n,);  returns (p,)
    mean = jnp.asarray(mean, dtype=float)
    goal = jnp.asarray(goal, dtype=float)
    if goal.shape != (self.model.n_states,):
        raise ValueError(
            f"goal must be a 1-D vector of length {self.model.n_states} "
            f"(the state dimension), got shape {goal.shape}"
        )
    return -self._gain @ (mean - goal)