Evaluatr is an AI-powered system that automates mapping evaluation
reports against structured frameworks while maintaining interpretability
and human oversight. Initially developed for IOM (International
Organization for Migration) evaluation reports and
the Strategic Results Framework (SRF), it
transforms a traditionally manual, time-intensive process into an
efficient, transparent workflow.
The system maps evaluation reports against hierarchical frameworks like the SRF (objectives, enablers, cross-cutting priorities, outcomes, outputs, indicators) and connects to broader frameworks like the Sustainable Development Goals (SDGs) for interoperability.
Beyond automation, Evaluatr prioritizes interpretability and
human-AI collaboration—enabling evaluators to understand the mapping
process, audit AI decisions, perform error analysis, and build training
datasets over time, ensuring the system aligns with organizational needs
through actionable, transparent, auditable methodology.
IOM evaluators possess deep expertise in mapping evaluation reports against frameworks like the Strategic Results Framework (SRF), but face significant operational challenges when processing reports that often exceed 150 pages of diverse content across multiple projects and contexts.
The core challenges are:
- Time-intensive process: Hundreds of staff-hours required per comprehensive mapping exercise
- Individual consistency: Even expert evaluators may categorize the same content differently across sessions
- Cross-evaluator consistency: Different evaluators may interpret and map identical content to different framework outputs
- Scale vs. thoroughness: Growing volume of evaluation reports creates pressure to choose between speed and comprehensive analysis
UN evaluation work encompasses several interconnected domains:
- Quality Check: Assessing evidence quality and methodological rigor in evaluation reports
- Mapping/Tagging: Identifying which standardized framework themes are central to each report
- Impact Evaluation: Measuring program effectiveness using RCTs, quasi-experimental designs, etc.
- Synthesis: Aggregating findings across reports on specific themes/regions to generate insights
Mapping/tagging is a foundational step that identifies which themes from established evaluation frameworks (like IOM’s Strategic Results Framework or the UN Global Compact for Migration) are central to each report. These frameworks provide agreed-upon nomenclature covering all relevant themes, ensuring common terminology across stakeholders and enabling interoperability for UN-wide aggregation and communication.
Rather than extracting evidence for specific themes, mapping creates a curated index enabling evaluators to retrieve the most relevant reports for subsequent synthesis work, maximizing both precision (finding all relevant reports) and recall (avoiding irrelevant ones).
Note
Throughout this documentation, we use “mapping” and “tagging” interchangeably.
- Repository Processing: Read and preprocess IOM evaluation report repositories with standardized outputs
- Automated Downloads: Batch download of evaluation documents from diverse sources
- OCR Processing: Convert scanned PDFs to searchable text using Optical Character Recognition (OCR) technology
- Content Enrichment: Fix OCR-corrupted headings and enrich documents with AI-generated image descriptions for high-quality input data
- Multi-Stage Pipeline: Three-stage mapping process that progressively narrows from broad themes ( SRF Enablers, Cross-cutting Priorities, GCM objectives) to specific SRF outputs. Each stage enriches context for the next—for example, knowing a report is cross-cutting in nature helps accurately map specific SRF outputs
- Cost Optimization: Leverages LLM prompt caching to minimize token usage and API costs during repeated analysis
- Command-line Interface: Streamlined pipeline execution through
easy-to-use CLI tools (
evl_ocr,evl_md_plus,evl_tag) - Transparent Tracing: Complete audit trails of AI decisions stored for human review and evaluation
- Knowledge Cards: Generate structured summaries for downstream AI tasks like proposal writing and synthesis
Tip
We recommend using isolated Python environments. uv provides fast, reliable dependency management for Python projects.
pip install evaluatrpip install git+https://github.com/franckalbinet/evaluatr.git# Clone the repository
git clone https://github.com/franckalbinet/evaluatr.git
cd evaluatr
# Install in development mode
pip install -e .
# Make changes in nbs/ directory, then compile:
nbdev_prepareNote
This project uses nbdev for literate programming - see the Development section for more details.
Create a .env file in your project root with your API keys:
MISTRAL_API_KEY="your_mistral_api_key"
GEMINI_API_KEY="your_gemini_api_key"
ANTHROPIC_API_KEY="your_anthropic_api_key"Note: Evaluatr uses lisette, LiteLLM and DSPy for LLM interactions, giving you flexibility to use any compatible language model provider beyond the examples above.
For IOM evaluators working with the official evaluation repository,
download the most recent evaluations from
evaluation.iom.int/evaluation-search-pdf
as .csv file, then preprocess/standardize it:
from evaluatr.readers import IOMRepoReader
fname = 'files/test/evaluation-search-export-11_13_2025--18_09_44.csv'
reader = IOMRepoReader(fname)
evals = reader()
evals[0]Year: 2025 | Organization: IOM | Countries: Worldwide
Documents: 2 available
ID: 9992310969aa2f428bc8aba29f865cf3
To find a particular evaluation by title or url:
from evaluatr.readers import find_eval
title = 'Evaluation of IOM Accountability to Affected Populations'
find_eval(evals, title, by='title')Year: 2025 | Organization: IOM | Countries: Worldwide
Documents: 4 available
ID: 6c3c2cf3fa479112967612b0baddab72
url = "https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/AAP%20Evaluation%20Report_final_.pdf"
find_eval(evals, url, by='url')Year: 2025 | Organization: IOM | Countries: Worldwide
Documents: 4 available
ID: 6c3c2cf3fa479112967612b0baddab72
from evaluatr.downloaders import download_evals
download_evals(evals)Process any evaluation report from PDF to tagged outputs using three streamlined commands.
Example: Given a report at
example-report-dir/example-report-file.pdf
Step 1: OCR Processing
evl_ocr example-report --pdf-dir . --output-dir md_libraryStep 2: Document Enrichment
evl_md_plus example-report --md-dir md_libraryStep 3: Framework Tagging
evl_tag example-report --md-dir md_libraryConvert PDF evaluation reports to structured markdown with extracted images.
Usage:
evl_ocr <eval-id> [OPTIONS]Options:
--pdf-dir: Directory containing PDF folders (default:../data/pdf_library)--output-dir: Output directory for markdown (default:../data/md_library)--overwrite: Reprocess if output already exists
Examples:
# Basic usage
evl_ocr example-report
# Custom paths
evl_ocr example-report --pdf-dir ./reports --output-dir ./markdown
# Force reprocess
evl_ocr example-report --overwriteOutput Structure:
md_library/
└── example-report/
└── example-report-file/
├── page_1.md
├── page_2.md
└── img/
├── img-0.jpeg
└── img-1.jpeg
Fix markdown headings hierarchy, append page numbers to each heading, and replace figures with AI-generated descriptions.
Usage:
evl_md_plus <eval-id> [OPTIONS]Options:
--md-dir: Directory containing markdown folders (default:../data/md_library)--overwrite: Reprocess if enhanced/enriched already exists
Examples:
# Basic usage
evl_md_plus example-report
# Force reprocess
evl_md_plus example-report --overwriteOutput: Creates enhanced/ and enriched/ directories with
corrected headings and image descriptions.
Map evaluation reports against established frameworks (SRF, GCM) using AI-assisted analysis.
Usage:
evl_tag <eval-id> [OPTIONS]Options:
--md-dir: Directory containing markdown folders (default:_data/md_library)--stages: Comma-separated stages to run (default:1,2,3)- Stage 1: SRF Enablers & Cross-cutting Priorities
- Stage 2: GCM Objectives
- Stage 3: SRF Outputs
--force-refresh: Force refresh specific stages (comma-separated:sections,stage1,stage2,stage3)
Examples:
# Run all stages
evl_tag example-report
# Run specific stages only
evl_tag example-report --stages 1,2
# Force refresh certain stages
evl_tag example-report --force-refresh stage1,stage3
# Combined options
evl_tag example-report --stages 2,3 --force-refresh sectionsOutput: Results stored in ~/.evaluatr/traces/ with complete audit
trails of AI decisions.
- Full Documentation: GitHub Pages
- Module Notebooks (literate programming with nbdev):
- Examples: See the
nbs/directory for Jupyter notebooks
Evaluatr is built using nbdev, enabling documentation-driven development where code, docs, and tests live together in notebooks.
We use fastcore.script to create CLI tools. See the nbdev console scripts tutorial for setup details.
We welcome contributions! Here’s how you can help:
- Fork the repository
# Install development dependencies
pip install -e .- Create a feature branch
(
git checkout -b feature/amazing-feature) - Make your changes in the
nbs/directory - Compile with
nbdev_prepare - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
See settings.ini for the complete list of
dependencies. Key packages include:
- fastcore & pandas - Core data processing
- lisette, litellm & dspy - AI/LLM integration
- mistralai - OCR processing
- Issues: GitHub Issues
- Discussions: GitHub Discussions

