Skip to content

EHS-Data-Standards/soma-evals

Repository files navigation

soma-evals

Schema-ablation evals for SOMA — measures how progressively richer LinkML schema context improves LLM-based structured extraction from scientific literature.

Prerequisites

  • uv (Python package manager)
  • just (task runner)
  • Python 3.12+

Setup

git clone https://github.com/EHS-Data-Standards/soma-evals.git
cd soma-evals
just setup

API keys

Set keys via the llm key store or environment variables. Use whichever method you prefer — you only need keys for the providers whose models you plan to run.

Option A — key store (recommended):

uv run llm keys set openai       # paste your OpenAI key
uv run llm keys set anthropic    # paste your Anthropic key
uv run llm keys set gemini       # paste your Gemini key

Option B — .env file:

cp .env.example .env

Then edit .env:

OPENAI_API_KEY=sk-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
GEMINI_API_KEY=AIyour-key-here

CBORG users (LBNL staff): Models prefixed with cborg/ route through the CBORG proxy and are free for lab staff. Authentication is handled by CBORG — no extra API key is needed beyond your CBORG access.

Running evals

just list-models        # show available models & tiers
just run-all            # run all four ablation levels (standard tier)
just run-baseline       # run a single level

Run a specific tier or override the default paper:

just run-all cheap
EVAL_PDF=my-paper.pdf EVAL_SLUG=my-slug just run-all

Ablation levels

Level Schema context provided
baseline None — LLM relies on training knowledge only
class_names Class names, descriptions, and mappings
full_classes + slot definitions with ranges & cardinality
with_enums + enumeration values and ontology meanings

Results are written to results/<level>/<model>/<paper>.yaml.

Tests & QC

just test       # run tests (no API calls)
just coverage   # tests with coverage report
just fix        # auto-fix lint/format (ruff)

License

MIT

Releases

No releases published

Packages

 
 
 

Contributors