PresentuneAI

Main Homepage
How PresenTuneAI works?
Main App View
Layouts Selections
Final Result

Inspiration

Watching researchers spend hours turning dense papers into slides, then give weak talks despite strong work, showed a clear gap. We set out to automate extraction, cleaning, and structuring so anyone can produce clear, accurate slides quickly.

What it does

PresenTuneAI automatically converts scientific papers into presentation slides with both text and visual content. Upload a research paper PDF, and the AI generates structured slide content with appropriate titles, bullet points, and key takeaways, plus generates relevant images and diagrams to illustrate complex concepts, transforming dense academic text into complete presentation-ready format in seconds.

Dataset Creation & OCR Pipeline

Combined GEM/SciDuet (4.7k public paper-slide pairs) with Doc2PPT dataset (33GB raw data)
Built custom OCR pipeline: extracted text from slide images, then used GPT-OSS-20B via Groq API to intelligently clean OCR errors and artifacts
Created Flask web application for batch processing, quality control, and real-time monitoring

2. Content Alignment System

Developed semantic alignment system to match slide content with corresponding paper sections
Used multiple strategies: keyword matching, semantic similarity embeddings, and structural analysis
Implemented grid search optimization across parameters (similarity thresholds, diversity weights, filtering criteria)

3. Model Training & Fine-Tuning

Fine-tuned GPT-OSS-20B (20B parameters) using LoRA (Low-Rank Adaptation) for efficiency
Trained on aligned paper-slide pairs with custom prompts for presentation structure
Generated three model variants: merged FP16, MXFP4 quantized, and raw LoRA adapters

4. Image Generation Integration

Integrated image generation capabilities for visual content creation
Automated diagram and illustration generation based on paper content
Combined text and visual outputs into complete presentation format

5. Production Deployment

5. Deployed on RunPod Serverless for scalable, cost-effective inference
Built end-to-end processing system from PDF input to presentation output
Published models on Hugging Face for public access
Created Docker containers and vLLM serving infrastructure optimized for serverless deployment

Built With

bash
conda
docker
flask
git
github
google-drive
groq
hugging-face-hub
javascript
json
jupyter-notebooks
opencv
parquet
peft
python
pytorch
rouge
runpod-serverless
safetensors
sentence-transformers
tesseract
transformers
vllm

Updates

William Loe started this project — Sep 11, 2025 06:18 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.