Late-Interaction Reranker

A search evaluation system that compares different search providers using a high-performance Rust reranker with token pruning. The system runs ablation studies to measure how late-interaction scoring and token pruning affect search quality.

What it does

This tool lets you:

Compare search results from different providers (DuckDuckGo, Wikipedia, etc.)
Rerank results using late-interaction MaxSim scoring instead of simple cosine similarity
Test how token pruning affects both speed and quality
Generate detailed reports with performance metrics

Demo Mode

The system includes demo components for testing:

Baseline Provider: Generates 15 dummy search results with titles like "Baseline Result 1 for 'query'"
Mock Embedder: Creates random normalized embeddings when sentence-transformers isn't available
These allow you to test the reranking pipeline without requiring real search APIs or embedding models

Quick Start

Prerequisites

Rust 1.70+ (install here)
Python 3.8+
Git

Setup

# Clone and navigate to the service directory
git clone <repository-url>
cd LateInteractionReranker/service

# Set up Python environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Run the system

Terminal 1 - Start the reranker service:

cd ranker-rs
cargo run --release

Terminal 2 - Run an evaluation:

cd orchestrator
python run.py --q "test query" --providers "baseline" --topk 5

The system will automatically run ablation studies comparing different configurations and generate reports.

Example output

Query: test query
Protocol: both (pairwise N=5 trials, pointwise rubric)
Providers: DDG, Wikipedia   Late: on/off   Prune: none/16-64/8-32

Ablation (DDG)
Late   Prune    rel@5    ent_cov  rerank_total_ms per_doc_p95_µs  total_ms  
--------------------------------------------------------------------------------
off    —        0.829    9        0.3             120             26846     
on     none     0.807    10       3.8             980             18144     
on     16/64    0.804    10       2.1             540             21529     
on     8/32     0.806    9        1.5             410             24806     

✅ Evaluation complete! Report saved to report.md

Command line options

python run.py --q "your query" \
  --providers "ddg,wikipedia" \
  --topk 5 \
  --judge heuristic \
  --protocol both

Option	Description	Default
`--q`	Search query	Required
`--providers`	Which search providers to use	`"ddg,wikipedia"`
`--topk`	Number of results to return	`5`
`--judge`	Evaluation method (`heuristic` or `llm`)	`"heuristic"`
`--protocol`	Evaluation type (`pointwise`, `pairwise`, `both`)	`"both"`

How it works

Architecture

Query → Search Providers → Embedding → Rust Reranker → Evaluation → Reports

Search: Queries multiple search providers (DuckDuckGo, Wikipedia, etc.)
Embed: Converts text to vectors using sentence-transformers
Rerank: Uses Rust service for fast late-interaction scoring with token pruning
Evaluate: Compares results using heuristic or LLM-based scoring
Report: Generates markdown and JSON reports with performance metrics

Late-interaction scoring

Instead of comparing single vectors, the system:

Breaks queries and documents into token-level embeddings
Uses MaxSim scoring: score = Σᵢ maxⱼ (Q[i] · D[j])
Prunes tokens to keep only the most important ones (16 query, 64 doc tokens)

Token pruning

To keep things fast, the system only keeps the most salient tokens:

salience = idf(token) × ||embedding||₂

Project structure

service/
├── ranker-rs/                 # Rust reranker service
│   ├── src/
│   │   ├── main.rs           # HTTP server
│   │   ├── scoring.rs        # MaxSim + token pruning
│   │   └── lib.rs
│   └── Cargo.toml
├── orchestrator/              # Python orchestrator
│   ├── run.py                # Main entry point
│   ├── providers.py          # Search providers
│   ├── embed.py              # Text embedding
│   ├── judge.py              # Result evaluation
│   ├── report.py             # Report generation
│   └── utils.py              # Utilities
└── requirements.txt

Troubleshooting

Port 8088 already in use:

lsof -i :8088  # Find what's using the port
# Kill the process or change the port in ranker-rs/src/main.rs

Python dependencies fail:

pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir

Out of memory:

Reduce --topk (try 5 instead of 20)
Use --judge heuristic instead of llm

Performance

Typical performance on modern hardware:

Reranking: ~3-6ms for 100 documents
Token pruning: 3-5x speedup vs naive approach
Quality: 10-20% better relevance@5 vs single-vector cosine

Development

Build Rust service:

cd ranker-rs
cargo build --release
cargo test

Run Python tests:

python -m pytest orchestrator/

Add new search provider:

Implement BaseProvider interface in providers.py
Add to get_provider() factory function
Update CLI argument parsing

Configuration

Set these environment variables if you want to use external APIs:

EXA_API_KEY: For Exa search provider
OPENAI_API_KEY: For LLM-based evaluation
ANTHROPIC_API_KEY: For LLM-based evaluation

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
service		service
README.md		README.md
history		history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Late-Interaction Reranker

What it does

Demo Mode

Quick Start

Prerequisites

Setup

Run the system

Example output

Command line options

How it works

Architecture

Late-interaction scoring

Token pruning

Project structure

Troubleshooting

Performance

Development

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Late-Interaction Reranker

What it does

Demo Mode

Quick Start

Prerequisites

Setup

Run the system

Example output

Command line options

How it works

Architecture

Late-interaction scoring

Token pruning

Project structure

Troubleshooting

Performance

Development

Configuration

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages