| 💾 Code | 📄 Paper | 🌐 Website |
|---|---|---|
| 🤗 Dataset | 🤖 Models | 📦 PyPI |
| 📊 Trajectories |
Structured Distillation of Web Agent Capabilities Enables Generalization
Xing Han Lù, Siva Reddy
This repository contains the code for the A3 framework, which uses LLMs to systematically generate synthetic web agent training data by decomposing the annotation process into three roles: Task Designer, Annotator, and Supervisor.
pip install agent-as-annotatorsOr install from source:
git clone https://github.com/McGill-NLP/agent-as-annotators.git
cd agent-as-annotators
pip install -e .vllm serve --config configs/vllm/Qwen3.5-9B.yamla3-eval --benchmark webarena_test --model A3-qwen3.5-9bThe A3 pipeline generates synthetic training data in 5 steps:
python scripts/create_personas.pya3-explore
python scripts/generate_task_intents.pypython scripts/create_synth_configs.pya3-synth --benchmark a3_synth --model gemini-3-propython scripts/convert_trajectories_to_json.py
python scripts/generate_rft_data.pya3-train --config configs/train/qwen3.5-9b.jsonTraining uses SFT with FSDP for multi-GPU parallelism. See configs/train/ for hyperparameters and configs/accelerate/ for FSDP configuration.
| Command | Description |
|---|---|
a3-eval |
Run evaluation on WebArena, VisualWebArena, WorkArena, MiniWoB |
a3-synth |
Run trajectory collection for A3-Synth |
a3-explore |
Run environment exploration |
a3-train |
Fine-tune a model with SFT |
a3-screen-utils |
Screen session management utilities |
agent-as-annotators/
agent_as_annotators/ # Core package
cli/ # CLI entry points (eval, synth, explore, train)
modeling.py # Agent model wrapper (vLLM, Gemini, OpenAI)
prompts/ # All prompt templates
judge/ # Inverted evaluation protocol (Judge module)
benchmarks/a3_synth/ # A3-Synth benchmark registration
exploration/ # Exploration task registration
utils/ # Utilities
configs/a3_synth/ # A3-Synth task configurations
configs/
model_configs.json # Model registry
train/ # Training hyperparameters
vllm/ # vLLM serving configs
accelerate/ # FSDP configs
scripts/ # Data pipeline scripts