# Development
## Setup
```bash
# install uv if you don't have it (see https://docs.astral.sh/uv/)
uv sync
```
`uv sync` reads `pyproject.toml`, creates a `.venv/`, and installs both
runtime and dev dependencies (declared under `[dependency-groups].dev`).
## Tasks
```bash
uv run ruff check src tests # lint
uv run ruff format --check src tests # format check
uv run ruff format src tests # reformat in place
uv run mypy src tests # strict type-check
uv run pytest # tests
uv run pytest --cov=src/iohmm_evac --cov-report=term-missing # with coverage
```
The acceptance gate is:
```bash
uv run ruff check && uv run ruff format --check && uv run mypy src tests && \
uv run pytest --cov=src/iohmm_evac --cov-fail-under=85
```
## Project layout
```
src/iohmm_evac/
├── __init__.py
├── cli.py # argparse-driven entry point
├── io.py # parquet + TOML sidecar writers
├── params.py # frozen dataclasses for every parameter
├── scenarios.py # named scenarios → SimulationConfig
├── types.py # FloatArray / IntArray / BoolArray, State, Zone, EvacPath
├── dgp/ # the simulator
│ ├── population.py
│ ├── timeline.py
│ ├── feedback.py
│ ├── emissions.py
│ ├── transitions.py
│ └── simulator.py # the per-step loop
├── inference/ # IO-HMM fitting (Build 2)
│ ├── data.py # SimulationBundle → IO-HMM input arrays
│ ├── log_space.py # logsumexp / log_softmax helpers
│ ├── fit_params.py # FitParameters and DGP→fit projection
│ ├── forward_backward.py
│ ├── m_step.py # closed-form + L-BFGS-B
│ ├── em.py # EM loop with monotonicity tracking
│ ├── initialization.py # random / kmeans / truth init strategies
│ ├── fit.py # multi-restart driver
│ ├── io.py # fit-bundle Parquet/TOML serialization
│ └── cli.py # `iohmm-evac fit` subcommand
├── diagnostics/ # recovery metrics (Build 2)
│ ├── decoding.py # Viterbi + posterior mode
│ ├── alignment.py # Hungarian state alignment
│ ├── recovery.py # state and parameter recovery
│ └── cli.py # `iohmm-evac diagnose recovery`
├── report/ # diagnostic plots (Builds 1.5–1.7)
│ ├── plots.py
│ ├── recovery_plots.py # recovery scatters / confusion / LL trace (Build 2)
│ ├── recovery_cli.py # report subcommands for fit-side diagnostics
│ ├── summary.py
│ ├── loader.py
│ ├── constants.py
│ └── cli.py
└── bootstrap/ # parametric bootstrap + shift sweep (Build 4)
├── resample.py # household-level resampling with replacement
├── runner.py # parallel B-fit driver (joblib + loky)
├── shift_sweep.py # apply each θ̂ to each warning-shift δ
└── aggregate.py # collect per-(b, δ) metrics into the band grid
```
Bootstrap source files stay ≤ 300 lines per Build 4's acceptance gate.
## Adding a new scenario
1. Add a builder function to `src/iohmm_evac/scenarios.py`. Pattern: start
from `SimulationConfig()`, then `dataclasses.replace` the bundle(s) you
want to change.
2. Register it in the `SCENARIO_BUILDERS` dict.
3. Add a row to [`scenarios.md`](scenarios.md) describing what changes and
why.
4. Add a test in `tests/test_scenarios.py` that asserts the post-condition
you care about (e.g., a parameter landed where you expected, or the
resulting simulation is shifted in the expected direction).
The CLI will pick up the new name automatically — the `--scenario` choices
are derived from `list_scenarios()`.
## Reproducibility hygiene
* Never call `np.random.<x>` directly. All randomness flows through a
`numpy.random.Generator` derived from the user-supplied seed.
* When sampling inside helper functions, accept `rng: Generator` as a
parameter rather than constructing one internally. This keeps the
randomness chain auditable.
## Performance
The full DGP at `N=10_000, T=120` runs in well under a second on a typical
laptop. If you make a change and notice the runtime jump, the most likely
culprits (in decreasing order of probability) are:
1. A Python-level loop replaced what used to be a NumPy reduction. Profile
with `uv run python -m cProfile -s cumulative -m iohmm_evac.cli simulate ...`.
2. Allocating a new `(N, T+1)` array per step instead of reusing one row.
3. Using `np.append` / `np.concatenate` inside the per-step loop.
## Joblib parallelism and BLAS thread pinning
The bootstrap runner uses joblib's **loky** backend
(`Parallel(n_jobs=..., backend="loky")`). Loky spawns subprocesses (rather
than threads or forks), which lets each worker run an independent EM fit
without GIL contention.
To stop NumPy/SciPy's BLAS from over-subscribing the box when N workers
are already running in parallel, `iohmm_evac.bootstrap.runner` calls
`os.environ.setdefault(...)` for `OMP_NUM_THREADS`, `OPENBLAS_NUM_THREADS`,
`MKL_NUM_THREADS`, `BLIS_NUM_THREADS`, `VECLIB_MAXIMUM_THREADS`, and
`NUMEXPR_NUM_THREADS` *at module import time, before NumPy is imported*.
Loky workers re-import the module fresh, so the worker-side BLAS sees
single-thread settings.
If you need to override (e.g. on a 64-core box you want 8 outer × 8 inner
threads), set the env vars explicitly before launching `iohmm-evac`:
```bash
OMP_NUM_THREADS=8 uv run iohmm-evac bootstrap fit --jobs 8 ...
```
`setdefault` will respect the explicit value and skip the override.
## The test-only clean DGP
[`tests/_clean_dgp.py`](../tests/_clean_dgp.py) is a deliberately simpler
DGP used by the inference test suite. It matches the IO-HMM exactly: no
endogenous feedback (`pi`, `c`, `tir`), no `evac_path` bookkeeping — just
state-conditional emissions over a fixed exogenous input sequence.
It exists *separately from* the production DGP in `iohmm_evac.dgp` because
the production DGP is intentionally mis-specified relative to the IO-HMM
(the chapter's narrative point about real-world model fit). Recovery tests
need a setting where the IO-HMM is the *correct* model — that's the role
of the clean DGP.
The clean DGP is test-only (the leading underscore in
`_clean_dgp.py` keeps it out of pytest's auto-collection) and is not part
of the public package.