Engine state & substrates¶
What the engine reads from its input, declared as structural Protocols.
Any object satisfying EngineState / SimilarityMatrix works — pandas
adapters for production, dict adapters for synthetic/tests.
engine
¶
Substrate-agnostic engine input — EngineState + SimilarityMatrix.
The recommendation engine in recommender.py used to require a
pandas.DataFrame for its scoring input and another pandas.DataFrame
for protocol_similarity. This module breaks that requirement:
EngineState(Protocol) declares what the engine needs from its patient-scoped state. Any object satisfying the protocol works.SimilarityMatrix(Protocol) does the same for protocol pairwise similarity.ProtocolRowis the row-shape the engine reads fromEngineState. Plain dataclass — no pandas dependency.PatientState/DataFrameSimilarityadapt the existing pandas-based pipeline output to the protocols. Used byCDSSand production code.DictPatientState/DictSimilarityare pandas-free alternatives. Useful for synthetic backtests, unit tests, ad-hoc replays.
The engine internals (_bootstrap_strategy, _update_strategy,
_top_up_schedule, etc. in recommender.py) now type-hint
EngineState instead of PatientState — they work with either
substrate.
ProtocolRow
dataclass
¶
ProtocolRow(
patient_id: int,
protocol_id: int,
score: float,
days: list[int] = list(),
usage: int = 0,
usage_week: int = 0,
ppf: float | None = None,
delta_dm: float | None = None,
recent_adherence: float | None = None,
weeks_since_start: int = 0,
contrib: list[float] | None = None,
)
Per-(patient, protocol) cell. The minimum the engine needs.
Optional fields default to None / 0 so synthetic callers can omit them. The engine treats missing values as "not relevant" rather than NaN.
as_dict
¶
Render in the column-keyed shape pandas expects.
Keys match the constants used throughout the rest of the
package (PATIENT_ID, PROTOCOL_ID, SCORE, DAYS, …) — so the
result drops into a pd.DataFrame directly.
Source code in src\ai_cdss\engine.py
from_dict
classmethod
¶
Inverse of as_dict — build a ProtocolRow from a dict-shaped
row (typically a pandas to_dict("records") element).
Source code in src\ai_cdss\engine.py
EngineState
¶
Bases: Protocol
What the recommendation engine reads from its input state.
Implementations: PatientState, DictPatientState.
Adding a new substrate (polars, xarray) = new implementation of
this protocol; engine code is unchanged.
SimilarityMatrix
¶
Bases: Protocol
Pairwise protocol similarity queries.
Implementations: DataFrameSimilarity, DictSimilarity.
PatientState
¶
EngineState backed by a pandas DataFrame.
The scoring DataFrame has one row per (patient, protocol) for every patient in the cohort. This adapter slices to one patient and exposes the engine-shaped read methods.
Hot-path optimization (phase F3): every property is cached on first
access (@cached_property). The "has non-empty DAYS" mask is
computed once and reused by prescribed_rows, is_week_skipped,
lowest_scoring_prescribed, and prescriptions. Per-protocol
score-row lookups go through a dict index built lazily — avoids the
O(N) boolean-mask scan on each score_row(pid) call.
Read-only — the underlying frame must not be mutated after this state is constructed (the caches assume immutability).
Source code in src\ai_cdss\engine.py
DictPatientState
¶
DictPatientState(
patient_id: int,
rows: Mapping[int, ProtocolRow],
*,
scoring_attrs: dict[str, Any] | None = None,
)
EngineState backed by an in-memory dict of ProtocolRow.
Construct directly:
state = DictPatientState(
patient_id=4378,
rows={
200: ProtocolRow(patient_id=4378, protocol_id=200,
score=1.8, days=[0, 2, 4]),
201: ProtocolRow(...),
},
)
Or from a list:
state = DictPatientState.from_rows(patient_id, [row1, row2, ...])
Source code in src\ai_cdss\engine.py
prescribed_rows
cached
property
¶
prescribed_rows: list[ProtocolRow]
Rows with non-empty DAYS. Cached; this state is immutable
post-construction (see with_prescribed_set for a copy-on-
change builder).
from_rows
classmethod
¶
from_rows(
patient_id: int,
rows: Iterable[ProtocolRow],
*,
scoring_attrs: dict[str, Any] | None = None,
) -> "DictPatientState"
Build from an iterable of ProtocolRow objects.
Source code in src\ai_cdss\engine.py
with_prescribed_set
¶
Return a new state with DAYS overrides applied to each protocol. Synthetic chained-mode backtest writes one line.
Source code in src\ai_cdss\engine.py
DataFrameSimilarity
¶
SimilarityMatrix backed by the long-form similarity DataFrame
(PROTOCOL_A, PROTOCOL_B, SIMILARITY).
Phase F3 optimization: instead of re-scanning the full DataFrame on
every similarities_for(...) call, build an _by_a dict-of-pairs
index once at construction. The DataFrame is touched only here.
Mirrors DictSimilarity — both implementations now share the same
query path.
Source code in src\ai_cdss\engine.py
DictSimilarity
¶
SimilarityMatrix backed by a dict of (a, b) → similarity.
Asymmetric: sim[(a, b)] and sim[(b, a)] may differ.
Construct: DictSimilarity({(200, 201): 0.83, (200, 202): 0.71, ...})
Source code in src\ai_cdss\engine.py
coerce_engine_state
¶
coerce_engine_state(
state: Any, patient_id: int | None = None
) -> EngineState
Adapt an input to EngineState.
EngineStateinstance → returned as-is.pd.DataFrame→ wrapped inPatientState(patient_id required).
Source code in src\ai_cdss\engine.py
coerce_similarity
¶
coerce_similarity(sim: Any) -> SimilarityMatrix
Adapt an input to SimilarityMatrix.
SimilarityMatrixinstance → returned as-is.pd.DataFrame→ wrapped inDataFrameSimilarity.dict[(a,b), float]→ wrapped inDictSimilarity.