Action selection & goals¶
What the agent wants, and how it picks an action. StateGoal and ObservationGoal are the goals you hand the Agent (the continuous-state answer to pymdp's C); Preference and EFESelector are the machinery underneath expected-free-energy selection.
StateGoal
dataclass
¶
A state-space objective: reach a target state (the LQR / fixed-sensor regime).
The complete spec for the state-tracking path - the target plus the LQR cost
weights it implies. precision is LQR's state weight Q; effort is its
action weight R, left None here because the action dimension p isn't known
until the Agent pairs this with a model (the Agent fills the identity). The
Agent dispatches a StateGoal to an LQRSelector. Not a pytree - construction-
time only; the Agent extracts a Preference for the selector.
Source code in src/cpomdp/selection.py
ObservationGoal
dataclass
¶
An observation-space objective: prefer to observe a target (the EFE regime).
The complete spec for the information-seeking path - the preferred observation,
how sharply it is preferred (precision), and the action-search config the
EFESelector front-loads: action_bounds is the action box, n_candidates
its resolution, horizon its lookahead depth. The Agent dispatches an
ObservationGoal to an EFESelector. Not a pytree - construction-time only; the
Agent extracts a Preference.
Source code in src/cpomdp/selection.py
Preference
dataclass
¶
What the agent wants: a goal and how sharply it is preferred.
Single-mode for v0.3 — one Gaussian preference. The disjunctive mixture
case (visit one of several goals) is RFC-002, deferred; this type is the seam
that a mixture Preference plugs into.
precision is unused by LQRSelector (it is baked into the controller's
Riccati solve at construction); it is carried here for the EFE pragmatic term
added in Phase 1A.
Source code in src/cpomdp/selection.py
EFESelector ¶
EFESelector(
model: LinearGaussianModel,
*,
n_candidates: int,
action_bounds: tuple[float, float],
horizon: int = 1,
)
EFE action selection over a front-loaded candidate grid, horizon-aware.
At horizon = 1 (default) it minimises one-step G over the grid. At
horizon > 1 it scores constant-action policies (each grid action held for H
steps) via policy_efe and returns the first action of the best one
(receding-horizon). Per-cycle cost is a single attributable number,
cost_per_cycle = n_candidates * horizon.
Honest caveat: horizon selects the best constant action, not the best
sequence. A genuinely sequential epistemic policy — move to sense, then exploit —
needs a varying sequence the constant-action family cannot express, so at H > 1 the
selector can still look myopic-ish on such tasks. True varying-sequence search is
the deferred v0.4 GradientEFESelector seam.
Source code in src/cpomdp/selection.py
n_candidates
property
¶
The per-cycle EFE-evaluation count — attributable work (RFC-001).
select ¶
The grid action minimising G over the horizon (the per-cycle work).
At horizon = 1 one vmap of the one-step kernel + argmin. At
horizon > 1 one vmap of policy_efe over the constant-action policies
+ argmin, returning the first (= constant) action of the best policy.