contributing.md

Contributing

Linters / Formatters

Formatter	Extension	Description
nbstripout	`.ipynb`	ensures Jupyter notebook output cells aren't committed
nbqa	`.ipynb`	Jupyter notebook linter and formatter
ruff	`.py`	general purpose python linter and formatter
ty	`.py`	static type checker
biome	`.js`, `.css`, `.html`	formats client-side code

ruff is excellent, but it's too opinionated for Jupyter notebooks. nbqa has relaxed alignment rules that cannot be configured in Ruff--specifically: aligning code blocks on equal signs.

Architecture

Warning

Work-in-progress and may be out-of-date.

config.py owns config loading and shared path resolution. activity.py and samples.py are CLIs. ml.py is a library used by api.py. notebooks/classify.ipynb reads labels and evaluates OCR/CLIP workflows.

flowchart TB
    CFG["config.py<br/>load + validate config<br/>shared path resolution"]
    A["activity.py<br/>timestamp extraction<br/>plot orchestration"]
    P["plots.py<br/>histograms + curves<br/>heatmaps"]
    S["samples.py<br/>generate samples"]
    N["notebooks/classify.ipynb<br/>OCR + CLIP analysis"]
    API["api.py<br/>manual labeling routes"]
    L["labels.jsonl<br/>manual labels"]
    ML["ml.py<br/>embeddings + clustering<br/>state writes"]

    b1[" "]:::ghost
    b2[" "]:::ghost
    c1[" "]:::ghost

    CFG --> A
    A --> P
    CFG --> S
    CFG --> c1 --> N
    CFG --> b1 --> b2 --> API
    S --> API
    API --> L
    S --> N
    N --> API
    API --> ML

    classDef ghost fill:transparent,stroke:transparent,color:transparent;

Code conventions

Typed dataclasses for structured data. dict[str, Any] only for genuinely open-ended config blobs.
String-keyed dispatch tables for routing by type or kind.
File-backed data (JSONL, labels) is cached by fingerprint or mtime. Avoid adding per-request reads.
Progress/status writes in loops are throttled by count and time. Avoid writing on every iteration.
Config validation is strict and happens at load time. Unknown references raise immediately.

Tools

pydantic is the config contract. All YAML input is validated through ConfigModel at load time. Do not re-validate or re-parse config downstream.
pandas is the data layer for anything involving image records, timestamps, or aggregations in the notebook, and sometimes in the app code. Manipulate JSON structures through and around the Flask API code only.

Install all dependencies

uv sync --all-extras --all-groups

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing

Linters / Formatters

Architecture

Code conventions

Tools

Install all dependencies

FilesExpand file tree

contributing.md

Latest commit

History

contributing.md

File metadata and controls

Contributing

Linters / Formatters

Architecture

Code conventions

Tools

Install all dependencies