Skip to content

Latest commit

 

History

History
99 lines (81 loc) · 3.4 KB

File metadata and controls

99 lines (81 loc) · 3.4 KB

Contributing

Linters / Formatters

Formatter Extension Description
nbstripout .ipynb ensures Jupyter notebook output cells aren't committed
nbqa .ipynb Jupyter notebook linter and formatter
ruff .py general purpose python linter and formatter
ty .py static type checker
biome .js, .css, .html formats client-side code

ruff is excellent, but it's too opinionated for Jupyter notebooks. nbqa has relaxed alignment rules that cannot be configured in Ruff--specifically: aligning code blocks on equal signs.

Architecture

Warning

Work-in-progress and may be out-of-date.

config.py owns config loading and shared path resolution. activity.py and samples.py are CLIs. ml.py is a library used by api.py. notebooks/classify.ipynb reads labels and evaluates OCR/CLIP workflows.

flowchart TB
    CFG["config.py<br/>load + validate config<br/>shared path resolution"]
    A["activity.py<br/>timestamp extraction<br/>plot orchestration"]
    P["plots.py<br/>histograms + curves<br/>heatmaps"]
    S["samples.py<br/>generate samples"]
    N["notebooks/classify.ipynb<br/>OCR + CLIP analysis"]
    API["api.py<br/>manual labeling routes"]
    L["labels.jsonl<br/>manual labels"]
    ML["ml.py<br/>embeddings + clustering<br/>state writes"]

    b1[" "]:::ghost
    b2[" "]:::ghost
    c1[" "]:::ghost

    CFG --> A
    A --> P
    CFG --> S
    CFG --> c1 --> N
    CFG --> b1 --> b2 --> API
    S --> API
    API --> L
    S --> N
    N --> API
    API --> ML

    classDef ghost fill:transparent,stroke:transparent,color:transparent;
Loading

Code conventions

  • Typed dataclasses for structured data. dict[str, Any] only for genuinely open-ended config blobs.
  • String-keyed dispatch tables for routing by type or kind.
  • File-backed data (JSONL, labels) is cached by fingerprint or mtime. Avoid adding per-request reads.
  • Progress/status writes in loops are throttled by count and time. Avoid writing on every iteration.
  • Config validation is strict and happens at load time. Unknown references raise immediately.

Tools

  • pydantic is the config contract. All YAML input is validated through ConfigModel at load time. Do not re-validate or re-parse config downstream.

  • pandas is the data layer for anything involving image records, timestamps, or aggregations in the notebook, and sometimes in the app code. Manipulate JSON structures through and around the Flask API code only.

Install all dependencies

uv sync --all-extras --all-groups