docs/diagnostics.md

# Recovery diagnostics

After fitting, we want to know how well the IO-HMM recovers the
DGP — both at the state level (does the inferred state path match
truth?) and at the parameter level (do the fitted intercepts and slopes
match the DGP's?). The
[`iohmm_evac.diagnostics`](../src/iohmm_evac/diagnostics) subpackage
provides the machinery; the CLI wraps it in
``iohmm-evac diagnose recovery`` and several
``iohmm-evac report …`` subcommands.

## Decoding

[`decoding.py`](../src/iohmm_evac/diagnostics/decoding.py) implements the
two standard MAP decoders:

- **Viterbi** — joint MAP path through the trellis, in log-space. The
  standard ``δ`` / ``ψ`` recursion. Returns an ``(N, T+1)`` int64 array.
- **Posterior mode** — per-step argmax of ``γ_{i,t}(k)``. Cheaper but
  generally less coherent than Viterbi (it can switch states arbitrarily
  between adjacent ``t``).

The fit bundle on disk stores the Viterbi path under
``posterior_states.parquet``.

## State alignment (label switching)

State indices in a fitted IO-HMM are arbitrary up to permutation: nothing
in the likelihood pins state 0 to "UA" or state 3 to "ER". To compare
fit-paths against truth-paths, we have to align labels first.

[`alignment.py`](../src/iohmm_evac/diagnostics/alignment.py) does this
with the Hungarian algorithm
(`scipy.optimize.linear_sum_assignment`) on the K×K co-occurrence matrix
between true and fit labels. The objective is to maximize the diagonal
mass, which is equivalent to minimizing the negated co-occurrence cost.

`align_states` returns a length-K permutation ``perm`` such that
``perm[fit_label] == true_label``. The relabeled fit path is
``perm[fit_states]``.

`align_fit_to_truth(fit, perm)` applies the same permutation to a
fitted ``FitParameters``, so per-state quantities (``α``, ``β``, ``π``,
emission parameters) line up with the truth's state ordering.

## State recovery

[`recovery.py`](../src/iohmm_evac/diagnostics/recovery.py) defines two
state-level metrics:

- **`state_recovery_accuracy`** — share of (i, t) pairs where the aligned
  fit label matches truth. A single number in ``[0, 1]``.
- **`state_recovery_confusion`** — row-normalized K×K confusion matrix.
  ``confusion[k, j]`` is the share of true-state-``k`` observations that
  the fit (after alignment) labeled as ``j``. Diagonal mass close to 1
  means good per-state recovery; off-diagonal mass shows where the fit
  conflates states.

These are visualized by
[`plot_state_recovery_confusion`](../src/iohmm_evac/report/recovery_plots.py).

## Parameter recovery

`parameter_recovery(truth, fit_aligned)` returns a
``ParameterRecoveryReport`` with side-by-side arrays of true and fitted
values for each parameter group, plus per-group RMSEs:

- transition ``α`` (intercepts) — RMSE over allowed, non-self cells only;
  forbidden cells are excluded since the model never learns them.
- transition ``β`` (slopes) — RMSE over the same mask, expanded over
  feature dimensions.
- emission ``p`` (Bernoulli rates), ``μ`` (Gaussian means), ``σ`` (Gaussian
  std devs), ``λ`` (Poisson rates) — straight RMSE across the K states.

The report is plotted by
[`plot_parameter_recovery`](../src/iohmm_evac/report/recovery_plots.py)
as a true-vs-estimated scatter with a 45° identity line; clusters above
the line mean the fit overestimates, clusters below mean it
underestimates.

## What to expect on the production DGP

Recovery on the production DGP will be **imperfect**. The IO-HMM does not
include the DGP's endogenous-feedback inputs (``π_t``, ``c_t``,
``tir_{i,t}``); see [`docs/inference.md`](inference.md) for why. Some
residual bias on the transition coefficients is therefore expected — the
fit is doing its best to absorb the omitted inputs into the intercepts
and the slopes of the features it does see.

That bias is part of the chapter narrative ("robustness to model
mis-specification"). The chapter author chooses how to frame it; the
codebase does not "fix" it by re-tuning parameters or by extending the
IO-HMM's input vector.

For sharp recovery numbers, run the same diagnostics on data generated
by the test-only clean DGP in
[`tests/_clean_dgp.py`](../tests/_clean_dgp.py). That clean DGP matches
the IO-HMM exactly (same state space, same input features, no
endogenous feedback) and is what the project's recovery test suite uses.

## CLI plumbing

```
iohmm-evac diagnose recovery --fit ./output/fit/ --truth ./output/baseline.parquet
```

Reads the fit bundle, the original simulation bundle, aligns the fit's
states to truth via Hungarian, and writes ``recovery.toml`` containing:

- `state_recovery.accuracy`
- `state_recovery.confusion` (row-normalized, ``K × K``)
- `state_recovery.permutation` (the recovered fit→true label map)
- `transition_rmse.{alpha, beta}`
- `emission_rmse.{p_departure, mu_displacement, sigma_displacement, lambda_comm}`

The same numbers are visible as plots via:

```
iohmm-evac report recovery-confusion --fit ... --truth ... --output ...
iohmm-evac report parameter-recovery --fit ... --truth ... --output ...
iohmm-evac report ll-trace --fit ... --output ...
iohmm-evac report fit-summary --fit ...
```

## Future work

- **Identifiability checks.** A separate diagnostic that flags when two
  states are statistically indistinguishable from the data would be
  useful for chapter §3.4. Out of scope for Build 2.
- **Coverage / interval recovery.** With Build 3's bootstrap we can ask
  whether 95% intervals cover the truth at the right rate.