| Formatter | Extension | Description |
|---|---|---|
| nbstripout | .ipynb |
ensures Jupyter notebook output cells aren't committed |
| nbqa | .ipynb |
Jupyter notebook linter and formatter |
| ruff | .py |
general purpose python linter and formatter |
| ty | .py |
static type checker |
| biome | .js, .css, .html |
formats client-side code |
ruff is excellent, but it's too opinionated for Jupyter notebooks. nbqa has relaxed alignment rules that cannot be configured in Ruff--specifically: aligning code blocks on equal signs.
Warning
Work-in-progress and may be out-of-date.
config.py owns config loading and shared path resolution. activity.py and samples.py are CLIs. ml.py is a library used by api.py. notebooks/classify.ipynb reads labels and evaluates OCR/CLIP workflows.
flowchart TB
CFG["config.py<br/>load + validate config<br/>shared path resolution"]
A["activity.py<br/>timestamp extraction<br/>plot orchestration"]
P["plots.py<br/>histograms + curves<br/>heatmaps"]
S["samples.py<br/>generate samples"]
N["notebooks/classify.ipynb<br/>OCR + CLIP analysis"]
API["api.py<br/>manual labeling routes"]
L["labels.jsonl<br/>manual labels"]
ML["ml.py<br/>embeddings + clustering<br/>state writes"]
b1[" "]:::ghost
b2[" "]:::ghost
c1[" "]:::ghost
CFG --> A
A --> P
CFG --> S
CFG --> c1 --> N
CFG --> b1 --> b2 --> API
S --> API
API --> L
S --> N
N --> API
API --> ML
classDef ghost fill:transparent,stroke:transparent,color:transparent;
- Typed dataclasses for structured data.
dict[str, Any]only for genuinely open-ended config blobs. - String-keyed dispatch tables for routing by type or kind.
- File-backed data (JSONL, labels) is cached by fingerprint or mtime. Avoid adding per-request reads.
- Progress/status writes in loops are throttled by count and time. Avoid writing on every iteration.
- Config validation is strict and happens at load time. Unknown references raise immediately.
-
pydantic is the config contract. All YAML input is validated through
ConfigModelat load time. Do not re-validate or re-parse config downstream. -
pandas is the data layer for anything involving image records, timestamps, or aggregations in the notebook, and sometimes in the app code. Manipulate JSON structures through and around the Flask API code only.
uv sync --all-extras --all-groups