PXAudit

Audit Proteomics Exchange (PRIDE) study metadata from the command line.

PXAudit fetches a PRIDE dataset's project metadata and file list, classifies every file with a deterministic FileTypeClassifier, then assigns two orthogonal quality scores:

Tier — a 7-level FAIR ladder from None through Diamond, based on metadata completeness and open-data practices.
Quant Tier — a secondary axis for quantification readiness, from No Quant through Quant-Complete.

Results are written to a local SQLite database so you can query, compare, and track scores over time.

Installation

Requires Python ≥ 3.12. uv is the recommended runner.

# Clone and install in editable mode
git clone https://github.com/LangeLab/PXAudit.git
cd PXAudit
uv sync

The pxaudit console script is installed automatically. Verify with:

uv run pxaudit --help

Quick Start

uv run pxaudit check PXD000001

On first run, PXAudit fetches project metadata and file lists from the PRIDE REST API and caches both responses under ~/.pxaudit_cache/. Subsequent runs for the same accession are instant (cache hits skip the network entirely). Audit results are written to pxaudit_results.db in the current directory.

Usage

`pxaudit check`

Usage: pxaudit check [OPTIONS] ACCESSION

  Audit a single Proteomics Exchange accession.

Options:
  --no-cache        Skip cache reads and always fetch fresh data from PRIDE.
  --db PATH         SQLite output path.  [default: pxaudit_results.db]
  --help            Show this message and exit.

Examples

# Audit a single dataset
uv run pxaudit check PXD004683

# Force a fresh fetch (bypasses local cache)
uv run pxaudit check PXD004683 --no-cache

# Write results to a specific database file
uv run pxaudit check PXD004683 --db ~/audits/lab.db

Non-PRIDE accessions (e.g. MSV, JPST, IPX) are accepted without error and assigned the Unverifiable tier — PXAudit only has access to the PRIDE REST API.

Example Output

Accession : PXD000001
Tier      : Silver
Quant Tier: Partial
------------------------------------------------
Metadata
  ✔ Title         TMT proteomics of human cell lines
  ✔ Organism      Homo sapiens (9606)
  ✔ Instrument    LTQ Orbitrap Velos
  ✘ Organism part annotated
  ✔ Publication   linked
  ✘ Quant metadata (CV methods)
------------------------------------------------
Files (142 total)
  ✔ Result/Search files present
  ✔ PSI-standard results (mzIdentML / mzTab-ID)
  ✔ Open spectra (mzML / MGF)
  ✘ SDRF file present
  ✔ mzTab summary present
  ✘ Tabular quant table (proteinGroups / evidence)
------------------------------------------------

Tier System

PXAudit scores each dataset on a 7-tier FAIR ladder. Every tier adds one FAIR requirement to the previous; a dataset must satisfy all criteria up to and including the tier it claims.

Tier	Requirements
None	Missing a mandatory metadata field (title, organism, or instrument).
Raw	Mandatory metadata present; no processed result files found.
Bronze	Result/search files present, but none are PSI-standard (mzIdentML / mzTab).
Silver	PSI-standard results present; no SDRF experimental-design file.
Gold	SDRF present; open spectra (mzML / MGF) or organism-part annotation missing.
Platinum	Open spectra + organism-part annotation present; no linked PubMed publication.
Diamond	All FAIR criteria met: PSI results, SDRF, open spectra, organism part, and a publication.

Tier logic is version-stamped (tier_logic_version = "v2.0") and stored in the database so that re-scoring after a logic update can be detected.

Quant Tier (secondary axis)

The quant tier is independent of the FAIR tier and indicates quantification readiness.

Quant Tier	Meaning
Unverifiable	Non-PRIDE accession; cannot be evaluated.
No Quant	No PSI-standard results and no tabular quant files.
Partial	Either PSI-standard IDs or a quant table — but not both.
Quant-Ready	PSI IDs + tabular quant table present; CV-term quantification metadata missing.
Quant-Complete	PSI IDs + tabular quant table + CV-term method metadata — fully described.

Validated Results

The following scores were produced by running pxaudit check against the live PRIDE REST API on 2026-03-21 and are included in the integration test suite.

Accession	Tier	Quant Tier
PXD057701	Raw	No Quant
PXD002244	Bronze	No Quant
PXD000001	Silver	Partial
PXD073444	Platinum	Partial
PXD075811	Platinum	Partial
PXD004683	Diamond	Partial

Output Database

Every check run upserts three tables in the SQLite database:

Table	Description
`study`	One row per accession — title, organism, instrument, submission year and type, keywords.
`study_files`	One row per file — name, PRIDE category, extension, FTP URL, size in bytes.
`audit`	One row per accession — computed tier, quant tier, 13 `has_*` quality flags, `files_fetch_failed`, `is_unverifiable`, and `tier_logic_version`.

Example queries

-- Tier distribution across all audited datasets
SELECT tier, COUNT(*) AS n FROM audit GROUP BY tier ORDER BY n DESC;

-- All Diamond datasets
SELECT accession, quant_tier FROM audit WHERE tier = 'Diamond';

-- Datasets ready for re-scoring after a logic update
SELECT accession FROM audit WHERE tier_logic_version != 'v2.0';

-- File-type breakdown for a single accession
SELECT file_category, COUNT(*) AS n
FROM study_files
WHERE accession = 'PXD004683'
GROUP BY file_category;

Development Setup

uv sync
uv run pre-commit install

Pre-commit runs ruff (lint + format, line-length 100) on every commit.

Project Layout

src/pxaudit/
├── cli.py              # click entry points (pxaudit check)
├── tier_engine.py      # 7-tier FAIR ladder + quant tier logic
├── file_classifier.py  # deterministic FileClass assignment for every file type
├── pride_client.py     # PRIDE REST API v3 client with pagination + retry/backoff
├── db.py               # SQLite schema + upsert helpers
└── cache.py            # local JSON response cache (~/.pxaudit_cache/)

Testing

# Unit tests (default — no network required)
uv run pytest

# With coverage report
uv run pytest --cov=pxaudit --cov-report=term-missing

# Live integration tests against the real PRIDE API (requires network)
uv run pytest -m integration -v --no-cov

The default run excludes integration tests (-m 'not integration' is set in pyproject.toml). The test suite has 384 unit tests with 100% branch coverage across all modules, plus 10 live integration tests covering six real PRIDE accessions.

Non-PRIDE Accessions

PXAudit only has access to the PRIDE REST API. Accessions from other repositories (MassIVE MSV, jPOST JPST, iProX IPX) are accepted as valid input but receive:

tier = "Unverifiable"
quant_tier = "Unverifiable"
All Boolean flags set to False

This is by design — no assertion of quality is made for data that cannot be inspected.

Roadmap

v0.1.0 covers single-study auditing. Planned work beyond this release:

Bulk audit — pxaudit bulk-audit --input accessions.txt with rate-limited batch processing and TSV/JSON export alongside the SQLite database.
File manifest — extend study_files with richer provenance (download URLs, file sizes already captured; planned: per-file hash, last-seen timestamps).
Reporting — pxaudit report --db results.db generating tier distributions, SDRF adoption trends, metadata completeness over time, and an exemplar shortlist as a Quarto-rendered HTML report.
Multi-repository — plugin adapters for MassIVE, jPOST, and iProX so non-PRIDE accessions are audited rather than marked Unverifiable.

Contributions and issue reports are welcome.

Citation

If you use PXAudit in your research, please cite it as:

@software{ergin_pxaudit_2026,
  author   = {Ergin, Enes Kemal},
  title    = {{PXAudit}: A command-line tool for auditing {Proteomics Exchange} study metadata},
  year     = {2026},
  version  = {0.1.0},
  url      = {https://github.com/LangeLab/PXAudit},
  license  = {MIT},
}

A CITATION.cff file is included in the repository root for tools that parse it automatically (e.g. GitHub's Cite this repository button, Zenodo).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
src/pxaudit		src/pxaudit
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PXAudit

Installation

Quick Start

Usage

`pxaudit check`

Examples

Example Output

Tier System

Quant Tier (secondary axis)

Validated Results

Output Database

Development Setup

Project Layout

Testing

Non-PRIDE Accessions

Roadmap

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

PXAudit

Installation

Quick Start

Usage

pxaudit check

Examples

Example Output

Tier System

Quant Tier (secondary axis)

Validated Results

Output Database

Development Setup

Project Layout

Testing

Non-PRIDE Accessions

Roadmap

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

`pxaudit check`

Packages