paper2summary

A lightweight scientific paper summarizer combining LoRA fine-tuning with RAG-based question answering. Run locally on consumer hardware.

Overview

Reading scientific papers is time-consuming due to knowledge gaps and high publication volumes. While LLMs like ChatGPT can help, they lack intuitive citations and have limited scope.

paper2summary addresses this by:

Fine-tuning a lightweight model (1.3GB) for paper summarization
Providing source references with highlighting for fact-checking
Supporting local deployment on laptops
Enabling flexible switching to larger models via APIs
Minimal codebase (~300 lines for fine-tuning)

Demo

RAG-based paper Q&A using Kotaemon with GPT-4o-mini:

Single Document Q&A - Query the "Attention Is All You Need" paper
Multi-Document Q&A - Compare Transformer and LoRA papers

Results

Evaluated on 6,440 test samples with beam search (beam size = 4):

Model	ROUGE-1	ROUGE-2	ROUGE-3	ROUGE-L
Llama-3.2-1B-Instruct (baseline)	36.69	7.47	1.95	19.36
Llama-PaperSummarization-LoRA	41.56	11.31	2.67	21.86

The LoRA model shows +51% ROUGE-2 and +37% ROUGE-3 improvement.

Installation

git clone https://github.com/gabe-zhang/paper2summary.git
cd paper2summary

uv venv && uv sync
uv run python -m spacy download en_core_web_sm

Usage

Training

uv run python src/train.py

Testing

# Quick test (10 samples)
uv run python src/test.py --model_path ./output/lora

# Full benchmark (6,440 samples)
uv run python src/test.py --model_path ./output/lora --num_samples 6440

Project Structure

paper2summary/
├── src/
│   ├── train.py           # LoRA fine-tuning script
│   ├── test.py            # Model evaluation script
│   ├── paper_dataset.py   # Dataset loading utilities
│   ├── config/
│   │   └── lora_config.py # Training hyperparameters
│   └── utils/
│       ├── eval.py        # ROUGE evaluation metrics
│       └── testing_utils.py
├── output/                # Model checkpoints (generated)
└── pyproject.toml

Training Details

Parameter	Value
Base Model	Llama-3.2-1B-Instruct (1.3GB)
LoRA Rank	8
Target Modules	q_proj, v_proj
Trainable Parameters	~850K (0.07%)
Context Length	10,182 tokens
Gradient Accumulation	4 steps
Training Steps	5,000
Evaluation Interval	Every 20 steps
Training Time	~28 hours on RTX A6000

Dataset

Fine-tuned on 10% of ccdv/arxiv-summarization:

Split	Samples	Avg. Article Tokens	Avg. Abstract Tokens
Train	~20,000	6,038	299
Validation	~640	5,894	172
Test	6,440	5,905	174

RAG Architecture

The RAG pipeline uses Kotaemon for document Q&A:

Component	Implementation
LLM	GPT-4o-mini (or Llama-3.2-1B-LoRA via Ollama)
Embedding	text-embedding-3-small (OpenAI)
Reranker	GPT-4o-mini
Vector DB	Chroma
Document Parser	Docling

References

LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., ICLR 2022)
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents (Cohan et al., NAACL 2018)

License

Code: MIT License
Llama 3.2: Llama 3.2 Community License
Third-party: THIRD_PARTY_LICENSES.md

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
output		output
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
cliff.toml		cliff.toml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paper2summary

Overview

Demo

Results

Installation

Usage

Training

Testing

Project Structure

Training Details

Dataset

RAG Architecture

References

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

paper2summary

Overview

Demo

Results

Installation

Usage

Training

Testing

Project Structure

Training Details

Dataset

RAG Architecture

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages