Skip to content

gabe-zhang/paper2summary

Repository files navigation

paper2summary

Model License

A lightweight scientific paper summarizer combining LoRA fine-tuning with RAG-based question answering. Run locally on consumer hardware.

Overview

Reading scientific papers is time-consuming due to knowledge gaps and high publication volumes. While LLMs like ChatGPT can help, they lack intuitive citations and have limited scope.

paper2summary addresses this by:

  • Fine-tuning a lightweight model (1.3GB) for paper summarization
  • Providing source references with highlighting for fact-checking
  • Supporting local deployment on laptops
  • Enabling flexible switching to larger models via APIs
  • Minimal codebase (~300 lines for fine-tuning)

Demo

RAG-based paper Q&A using Kotaemon with GPT-4o-mini:

Results

Evaluated on 6,440 test samples with beam search (beam size = 4):

Model ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L
Llama-3.2-1B-Instruct (baseline) 36.69 7.47 1.95 19.36
Llama-PaperSummarization-LoRA 41.56 11.31 2.67 21.86

The LoRA model shows +51% ROUGE-2 and +37% ROUGE-3 improvement.

Installation

git clone https://github.com/gabe-zhang/paper2summary.git
cd paper2summary

uv venv && uv sync
uv run python -m spacy download en_core_web_sm

Usage

Training

uv run python src/train.py

Testing

# Quick test (10 samples)
uv run python src/test.py --model_path ./output/lora

# Full benchmark (6,440 samples)
uv run python src/test.py --model_path ./output/lora --num_samples 6440

Project Structure

paper2summary/
├── src/
│   ├── train.py           # LoRA fine-tuning script
│   ├── test.py            # Model evaluation script
│   ├── paper_dataset.py   # Dataset loading utilities
│   ├── config/
│   │   └── lora_config.py # Training hyperparameters
│   └── utils/
│       ├── eval.py        # ROUGE evaluation metrics
│       └── testing_utils.py
├── output/                # Model checkpoints (generated)
└── pyproject.toml

Training Details

Parameter Value
Base Model Llama-3.2-1B-Instruct (1.3GB)
LoRA Rank 8
Target Modules q_proj, v_proj
Trainable Parameters ~850K (0.07%)
Context Length 10,182 tokens
Gradient Accumulation 4 steps
Training Steps 5,000
Evaluation Interval Every 20 steps
Training Time ~28 hours on RTX A6000

Dataset

Fine-tuned on 10% of ccdv/arxiv-summarization:

Split Samples Avg. Article Tokens Avg. Abstract Tokens
Train ~20,000 6,038 299
Validation ~640 5,894 172
Test 6,440 5,905 174

RAG Architecture

The RAG pipeline uses Kotaemon for document Q&A:

Component Implementation
LLM GPT-4o-mini (or Llama-3.2-1B-LoRA via Ollama)
Embedding text-embedding-3-small (OpenAI)
Reranker GPT-4o-mini
Vector DB Chroma
Document Parser Docling

References

License

About

Lightweight scientific paper summarizer using LoRA fine-tuning and RAG-based Q&A

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages