Multi‑agent AI presentation generator & beautifier targeting Most Useful Fine‑Tune.
We unify (i) the public GEM/SciDuet corpus (~4.7k paper–slide pairs from NLP venues) and (ii) Doc2PPT, which contains slide images with their corresponding research paper. For every Doc2PPT slide image, we run OCR and immediately pass the output through GPT-OSS-20B (served via Groq) to resolve typical OCR errors (layout artifacts, hyphenation, misspellings, and broken tokens). We align slide bullets to their source paper sections using a mix of similarity embeddings. We grid search sweeps key parameters (e.g., top-K pools, MMR/diversity weights) to optimize matches. We also evaluate mixtures of FAISS, BM25, and ROUGE by testing different percentage weightings to choose the best scoring blend.
Finally we train a LORA adapter with this data and publish three model variants on Hugging Face:
- Merged FP16 fine-tuned model
- MXFP4 quantized model and the
- LoRA adapters
We deploy on RunPod Serverless using Docker containers with vLLM for fast, scalable inference. Additionally all of the models can be found on Hugging Face Hub.
The system processes documents through a 5-phase workflow: upload extraction, outline generation, content editing with media library integration, layout selection using a JSON-driven layout library, and PPTX export with pixel-perfect preview accuracy.
The backend (FastAPI) processes documents using GROBID as a separate service for PDF parsing, sends requests to RunPod for AI inference and continuously queries for results, and generates images through Pexels integration.
For detailed technical documentation, see docs/.