Inspiration
Watching researchers spend hours turning dense papers into slides, then give weak talks despite strong work, showed a clear gap. We set out to automate extraction, cleaning, and structuring so anyone can produce clear, accurate slides quickly.
What it does
PresenTuneAI automatically converts scientific papers into presentation slides with both text and visual content. Upload a research paper PDF, and the AI generates structured slide content with appropriate titles, bullet points, and key takeaways, plus generates relevant images and diagrams to illustrate complex concepts, transforming dense academic text into complete presentation-ready format in seconds.
Dataset Creation & OCR Pipeline
- Combined GEM/SciDuet (4.7k public paper-slide pairs) with Doc2PPT dataset (33GB raw data)
- Built custom OCR pipeline: extracted text from slide images, then used GPT-OSS-20B via Groq API to intelligently clean OCR errors and artifacts
- Created Flask web application for batch processing, quality control, and real-time monitoring
2. Content Alignment System
- Developed semantic alignment system to match slide content with corresponding paper sections
- Used multiple strategies: keyword matching, semantic similarity embeddings, and structural analysis
- Implemented grid search optimization across parameters (similarity thresholds, diversity weights, filtering criteria)
3. Model Training & Fine-Tuning
- Fine-tuned GPT-OSS-20B (20B parameters) using LoRA (Low-Rank Adaptation) for efficiency
- Trained on aligned paper-slide pairs with custom prompts for presentation structure
- Generated three model variants: merged FP16, MXFP4 quantized, and raw LoRA adapters
4. Image Generation Integration
- Integrated image generation capabilities for visual content creation
- Automated diagram and illustration generation based on paper content
- Combined text and visual outputs into complete presentation format
5. Production Deployment
- 5. Deployed on RunPod Serverless for scalable, cost-effective inference
- Built end-to-end processing system from PDF input to presentation output
- Published models on Hugging Face for public access
- Created Docker containers and vLLM serving infrastructure optimized for serverless deployment
Log in or sign up for Devpost to join the conversation.