XtraGPT is a family of open-source Large Language Models (LLMs) designed specifically for human-AI collaborative academic paper revision. Unlike general-purpose models that often perform surface-level polishing, XtraGPT is fine-tuned to understand the full context of a research paper and execute specific, criteria-guided revision instructions. XtraGPT is the refiner of Friend Project: PaperDebugger
The models were trained on a dataset of 140,000 high-quality instruction-revision pairs derived from top-tier conference papers (ICLR).
XtraGPT is designed to be easily integrated into agent systems, enabling automatic routing of in-context academic revision tasks (e.g., via skills such as xtragpt-paper-revision-skill).
Key Features:
- Context-Aware: Processes the full paper context to ensure revisions maintain consistency with the global narrative.
- Controllable: Follows specific user instructions aligned with 20 academic writing criteria across 6 sections (Abstract, Introduction, etc.).
- Iterative Workflow: Designed to support the "Human-AI Collaborative" (HAC) lifecycle where authors retain creative control.
- Installation
- Model Zoo
- Training
- Evaluation
- Paper Revision Benchmark (PyPI Package)
- Inference with Transformers
- Production Usage (OpenClaw Integration)
- Model License
- Acknowledgements
- Citation
# Clone repository
git clone https://github.com/Xtra-Computing/XtraGPT.git
cd XtraGPT
# Install dependencies
pip install -r requirements.txt
# For training, also install LLaMA-Factory
pip install llamafactory| Model | Size | HuggingFace |
|---|---|---|
| XtraGPT-1.5B | 1.5B | Link |
| XtraGPT-3B | 3B | Link |
| XtraGPT-7B | 7B | Link |
| XtraGPT-14B | 14B | Link |
We use LLaMA-Factory for fine-tuning.
Copy configs/dataset_info.json to your LLaMA-Factory data directory:
cp configs/dataset_info.json /path/to/LLaMA-Factory/data/# Set environment variables
export MODEL_PATH="Qwen/Qwen2.5-7B-Instruct" # Base model
export OUTPUT_DIR="./output/xtragpt-7b" # Output directory
# Run training
bash scripts/train.shOr use LLaMA-Factory directly:
llamafactory-cli train configs/train_config.yamlKey hyperparameters (from paper):
| Parameter | Value |
|---|---|
| Learning Rate | 1e-6 |
| Epochs | 4 |
| Batch Size | 1 (per device) |
| Gradient Accumulation | 4 |
| Max Length | 16384 |
| Warmup Ratio | 0.1 |
Evaluates revisions for 6 paper sections: Title, Abstract, Introduction, Background, Evaluation, Conclusion.
Uses modified AlpacaEval for pairwise comparison.
# Clone and install AlpacaEval
git clone https://github.com/tatsu-lab/alpaca_eval.git
cd alpaca_eval && pip install -e .
# Copy our modified configs
cp -r ../6_component_evaluation/alpaca_eval_gpt4_turbo_fn/* \
src/alpaca_eval/evaluators_configs/alpaca_eval_gpt4_turbo_fn/Important: Replace
glm_winrate.pyin your AlpacaEval installation with our version, which disables theinstruction_difficultyfeature (not applicable to paper revision tasks) and keeps only length bias correction:cp 6_component_evaluation/glm_winrate.py $(python -c "import alpaca_eval; print(alpaca_eval.__path__[0])")/metrics/glm_winrate.py
python 6_component_evaluation/convert_predictions.py \
--input_dir ./predictions \
--output_dir ./formatted_predictions \
--model_name "XtraGPT-7B"export OPENAI_API_KEY="your-api-key"
bash 6_component_evaluation/run_eval.sh \
./formatted_predictions/xtragpt \
./formatted_predictions/baseline \
./eval_resultsUses AI-Scientist to evaluate entire papers.
git clone https://github.com/SakanaAI/AI-Scientist.git
cd AI-Scientist && pip install -e .export OPENAI_API_KEY="your-api-key"
python full_paper_evaluation/ai_scientist_eval.py \
--paper_path ./papers/my_paper.pdf \
--output ./review_results.json \
--model "gpt-4o"XtraGPT/
βββ configs/
β βββ train_config.yaml # Training configuration
β βββ dataset_info.json # Dataset configuration for LLaMA-Factory
βββ scripts/
β βββ train.sh # Training script
β βββ predict.sh # Inference script
βββ 6_component_evaluation/ # Component-wise evaluation
β βββ alpaca_eval_gpt4_turbo_fn/
β βββ convert_predictions.py
β βββ run_eval.sh
βββ full_paper_evaluation/ # Full paper evaluation
β βββ ai_scientist_eval.py
β βββ analyze_results.py
β βββ paper_results/
βββ train/
β βββ data/
β βββ README.md
βββ examples/
β βββ inference_example.py
βββ xtragpt-paper-revision-skill/
β βββ skills/skill.xtragpt-paper-revision-skill.yaml
βββ requirements.txt
βββ README.md
We provide a standalone Python package for benchmarking paper revision models. Install it directly from PyPI:
pip install paper-revision-benchimport paper_revision_bench as prb
# Prepare your data
samples = [
{
"instruction": "Improve the clarity of this title",
"input": "A Study of Neural Networks",
"output_1": "Deep Learning for Image Classification", # Model A output
"output_2": "Neural Network Analysis Study", # Model B output
}
]
# Run evaluation with GPT-4-Turbo as judge
results = prb.evaluate(
samples=samples,
judge="openai/gpt-4-turbo",
criteria="clarity"
)
print(f"Model A win rate: {results.win_rate:.1%}")To reproduce the exact evaluation from our paper:
from paper_revision_bench import get_paper_eval_prompt, list_paper_sections
# Available sections: title, abstract, introduction, background, evaluation, conclusion
print(list_paper_sections())
# Get the evaluation prompt for a specific section
prompt = get_paper_eval_prompt("title")# Evaluate from JSON file
paper-revision-bench evaluate \
--input samples.json \
--judge openai/gpt-4-turbo \
--criteria clarity \
--output results.json
# List available criteria and judges
paper-revision-bench list-criteria
paper-revision-bench list-judges| Judge | Model ID |
|---|---|
| GPT-4-Turbo | openai/gpt-4-turbo |
| GPT-4o | openai/gpt-4o |
| Claude 3.5 Sonnet | anthropic/claude-3-5-sonnet-20241022 |
| Local Ollama | ollama/llama3 |
| vLLM Server | vllm/model-name |
For advanced usage (length-controlled win rate, weighted overall score across sections), see the package README.
To use XtraGPT with the standard Hugging Face transformers library, ensure you format your input using the specific tags <PAPER_CONTENT>, <SELECTED_CONTENT>, and <QUESTION>.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Select the model size: "XtraGPT-1.5B", "XtraGPT-3B", "XtraGPT-7B", or "XtraGPT-14B"
model_name = "Xtra-Computing/XtraGPT-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Define the Prompt Template tailored for XtraGPT
prompt_template = """Act as an expert model for improving articles **PAPER_CONTENT**.
The output needs to answer the **QUESTION** on **SELECTED_CONTENT** in the input. Avoid adding unnecessary length, unrelated details, overclaims, or vague statements.
Focus on clear, concise, and evidence-based improvements that align with the overall context of the paper.
<PAPER_CONTENT>
{paper_content}
</PAPER_CONTENT>
<SELECTED_CONTENT>
{selected_content}
</SELECTED_CONTENT>
<QUESTION>
{user_question}
</QUESTION>"""
# Example Data (from the "Attention Is All You Need" paper)
paper_content = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train."
selected_content = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration."
user_question = "help me make it more concise."
# Format the input
formatted_prompt = prompt_template.format(
paper_content=paper_content,
selected_content=selected_content,
user_question=user_question
)
messages = [
{"role": "user", "content": formatted_prompt}
]
# Apply chat template
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384,
temperature=0.1
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)XtraGPT can be used as a specialized academic writing backend in agent systems such as OpenClaw, Cursor, or custom research workflows.
Unlike general-purpose LLMs, XtraGPT is optimized for high-precision, context-aware paper revision, and is best used as a dedicated revision module that is automatically invoked when needed.
We provide an official OpenClaw integration package:
npm install xtragpt-paper-revision-skill
npx xtragpt-paper-revision-skill initThis will scaffold the required provider, skill, and routing rules into your project.
β οΈ Prerequisite: You must first serve a self-hosted OpenAI-compatible XtraGPT endpoint (e.g., via vLLM, SGLang, or Ollama).
After installation, configure your endpoint:
export XTRAGPT_BASE_URL=http://127.0.0.1:8088/v1
export XTRAGPT_API_KEY=dummyTo integrate XtraGPT into your system:
- Serve XtraGPT locally (e.g., via vLLM, SGLang, or Ollama)
- Register it as a model provider (OpenAI-compatible endpoint recommended)
- Add the
xtragpt-paper-revision-skillskill - Enable routing rules for academic editing
π The system will then automatically dispatch paper revision requests to XtraGPT, with no special prompting required.
For model access and deployment references, see: https://huggingface.co/Xtra-Computing/XtraGPT-7B
In a typical workflow:
User request
β
Agent router detects "revision intent"
β
xtragpt-paper-revision-skill skill
β
XtraGPT
β
Revised academic text
"Rewrite this motivation paragraph so it clearly reflects the research gap stated in the abstract."
"Revise this introduction to better align with our claimed contributions and avoid inconsistency."
"Improve this paragraph by incorporating the experimental findings mentioned later in the paper."
"This section sounds fluent but unconvincing, strengthen the argument using the paperβs overall narrative."
"Make this contribution paragraph more precise and ensure it matches what we actually evaluate in Section 4."
"Reduce overclaim in this paragraph while keeping it aligned with the evidence presented in the paper."
Best suited for instruction-driven, in-context revision tasks, such as:
- aligning a paragraph with the paperβs overall narrative (e.g., abstract, contributions, or results)
- revising content to satisfy specific writing criteria (e.g., strengthen motivation, reduce overclaim)
- ensuring consistency across sections (e.g., introduction β evaluation β conclusion)
- refining rebuttals using evidence grounded in the paper
Not intended for:
- open-ended conversation or brainstorming
- coding or debugging
- factual Q&A or retrieval tasks
In practice, XtraGPT should be used alongside a general LLM, not as a replacement.
General-purpose LLMs typically operate at the local text level, which leads to:
- fluent but unconvincing or misaligned revisions
- weak handling of cross-section dependencies
- limited ability to follow structured writing criteria
- tendency to introduce overclaim or generic phrasing
XtraGPT is trained for controllable, context-aware revision, enabling:
- revisions grounded in the full paper context, not just local text
- alignment with explicit user instructions and writing criteria
- consistent argumentation across sections
- more defensible and academically faithful outputs
General LLM β drafting / reasoning
XtraGPT β in-context revision / alignment
This model is released under the ModelGo Zero License 2.0 (MG0-2.0).
MG0-2.0 is a highly permissive open model license designed to facilitate the widest possible adoption and collaboration. It allows for unrestricted use, reproduction, distribution, and the creation of derivative works including for commercial purposes, without requiring attribution or imposing copyleft restrictions.
For more details on the license terms, please visit ModelGo.li or refer to the LICENSE file in the repository.
@misc{nuo2025xtragpt,
title={XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision},
author={Nuo Chen and Andre Lin HuiKai and Jiaying Wu and Junyi Hou and Zining Zhang and Qian Wang and Xidong Wang and Bingsheng He},
year={2025},
eprint={2505.11336},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.11336},
}