evaluation

Model Evaluation Framework

Sophisticated toolkit for evaluating Qwen model variants and LoRA fine-tuned models.

Quick Start

# Install dependencies
pip3 install -r requirements-eval.txt

# List test suites
python3 qwen-eval-v2.py --list-suites

# Run evaluation
python3 qwen-eval-v2.py --models qwen2.5:8b qwen-8b-dialog-v1 --verbose

Documentation

QWEN-EVAL-V2-README.md - Comprehensive guide (v2 framework)
MIGRATION-V1-TO-V2.md - Migration guide from v1
QWEN-EVAL-README.md - v1 documentation

Structure

evaluation/
├── README.md                   # This file
├── qwen-eval-v2.py            # Main CLI (v2 - recommended)
├── qwen-eval.py               # Legacy CLI (v1 - simple)
├── qwen_eval/                 # Core package
│   ├── config.py              # Configuration & logging
│   ├── core.py                # Evaluation engine
│   ├── test_suites.py         # Built-in test suites
│   ├── metrics.py             # Metric registry (15+ metrics)
│   └── reporters.py           # Output formatters
├── eval-config-example.yaml   # Example configuration
├── requirements-eval.txt      # Python dependencies
├── QWEN-EVAL-V2-README.md    # Full documentation
└── MIGRATION-V1-TO-V2.md     # Upgrade guide

Features

Parallel execution - Run evaluations concurrently
Result caching - Avoid redundant inference
YAML configs - Reproducible evaluations
15+ metrics - Pedagogical quality, dialogue, complexity
4 test suites - Pedagogical, dialogue, baseline, stress
3 reporters - JSON, Markdown, side-by-side comparison
Extensible - Add custom metrics and test suites

Workflow

1. Basic Evaluation

python3 qwen-eval-v2.py --models base-model fine-tuned-v1

2. With Config File

python3 qwen-eval-v2.py --config eval-config-example.yaml

3. Parallel + Cached

python3 qwen-eval-v2.py --workers 8 --cache --verbose

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Model Evaluation Framework

Quick Start

Documentation

Structure

Features

Workflow

1. Basic Evaluation

2. With Config File

3. Parallel + Cached

See Also

Name		Name	Last commit message	Last commit date
parent directory ..
qwen_eval		qwen_eval
MIGRATION-V1-TO-V2.md		MIGRATION-V1-TO-V2.md
QWEN-EVAL-README.md		QWEN-EVAL-README.md
QWEN-EVAL-V2-README.md		QWEN-EVAL-V2-README.md
README.md		README.md
eval-config-example.yaml		eval-config-example.yaml
qwen-eval-v2.py		qwen-eval-v2.py
qwen-eval.py		qwen-eval.py
requirements-eval.txt		requirements-eval.txt

FilesExpand file tree

evaluation

Directory actions

More options

Directory actions

More options

Latest commit

History

evaluation

Folders and files

parent directory

README.md

Model Evaluation Framework

Quick Start

Documentation

Structure

Features

Workflow

1. Basic Evaluation

2. With Config File

3. Parallel + Cached

See Also