A lightweight scientific paper summarizer combining LoRA fine-tuning with RAG-based question answering. Run locally on consumer hardware.
Reading scientific papers is time-consuming due to knowledge gaps and high publication volumes. While LLMs like ChatGPT can help, they lack intuitive citations and have limited scope.
paper2summary addresses this by:
- Fine-tuning a lightweight model (1.3GB) for paper summarization
- Providing source references with highlighting for fact-checking
- Supporting local deployment on laptops
- Enabling flexible switching to larger models via APIs
- Minimal codebase (~300 lines for fine-tuning)
RAG-based paper Q&A using Kotaemon with GPT-4o-mini:
- Single Document Q&A - Query the "Attention Is All You Need" paper
- Multi-Document Q&A - Compare Transformer and LoRA papers
Evaluated on 6,440 test samples with beam search (beam size = 4):
| Model | ROUGE-1 | ROUGE-2 | ROUGE-3 | ROUGE-L |
|---|---|---|---|---|
| Llama-3.2-1B-Instruct (baseline) | 36.69 | 7.47 | 1.95 | 19.36 |
| Llama-PaperSummarization-LoRA | 41.56 | 11.31 | 2.67 | 21.86 |
The LoRA model shows +51% ROUGE-2 and +37% ROUGE-3 improvement.
git clone https://github.com/gabe-zhang/paper2summary.git
cd paper2summary
uv venv && uv sync
uv run python -m spacy download en_core_web_smuv run python src/train.py# Quick test (10 samples)
uv run python src/test.py --model_path ./output/lora
# Full benchmark (6,440 samples)
uv run python src/test.py --model_path ./output/lora --num_samples 6440paper2summary/
├── src/
│ ├── train.py # LoRA fine-tuning script
│ ├── test.py # Model evaluation script
│ ├── paper_dataset.py # Dataset loading utilities
│ ├── config/
│ │ └── lora_config.py # Training hyperparameters
│ └── utils/
│ ├── eval.py # ROUGE evaluation metrics
│ └── testing_utils.py
├── output/ # Model checkpoints (generated)
└── pyproject.toml
| Parameter | Value |
|---|---|
| Base Model | Llama-3.2-1B-Instruct (1.3GB) |
| LoRA Rank | 8 |
| Target Modules | q_proj, v_proj |
| Trainable Parameters | ~850K (0.07%) |
| Context Length | 10,182 tokens |
| Gradient Accumulation | 4 steps |
| Training Steps | 5,000 |
| Evaluation Interval | Every 20 steps |
| Training Time | ~28 hours on RTX A6000 |
Fine-tuned on 10% of ccdv/arxiv-summarization:
| Split | Samples | Avg. Article Tokens | Avg. Abstract Tokens |
|---|---|---|---|
| Train | ~20,000 | 6,038 | 299 |
| Validation | ~640 | 5,894 | 172 |
| Test | 6,440 | 5,905 | 174 |
The RAG pipeline uses Kotaemon for document Q&A:
| Component | Implementation |
|---|---|
| LLM | GPT-4o-mini (or Llama-3.2-1B-LoRA via Ollama) |
| Embedding | text-embedding-3-small (OpenAI) |
| Reranker | GPT-4o-mini |
| Vector DB | Chroma |
| Document Parser | Docling |
- LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., ICLR 2022)
- A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents (Cohan et al., NAACL 2018)
- Code: MIT License
- Llama 3.2: Llama 3.2 Community License
- Third-party: THIRD_PARTY_LICENSES.md