Skip to content

willloe/PresenTuneAI

Repository files navigation

PresenTuneAI

Multi‑agent AI presentation generator & beautifier targeting Most Useful Fine‑Tune.

AI Training and Infrastructure

We unify (i) the public GEM/SciDuet corpus (~4.7k paper–slide pairs from NLP venues) and (ii) Doc2PPT, which contains slide images with their corresponding research paper. For every Doc2PPT slide image, we run OCR and immediately pass the output through GPT-OSS-20B (served via Groq) to resolve typical OCR errors (layout artifacts, hyphenation, misspellings, and broken tokens). We align slide bullets to their source paper sections using a mix of similarity embeddings. We grid search sweeps key parameters (e.g., top-K pools, MMR/diversity weights) to optimize matches. We also evaluate mixtures of FAISS, BM25, and ROUGE by testing different percentage weightings to choose the best scoring blend.

Finally we train a LORA adapter with this data and publish three model variants on Hugging Face:

  • Merged FP16 fine-tuned model
  • MXFP4 quantized model and the
  • LoRA adapters

We deploy on RunPod Serverless using Docker containers with vLLM for fast, scalable inference. Additionally all of the models can be found on Hugging Face Hub.

Backend & Frontend Architecture

The system processes documents through a 5-phase workflow: upload extraction, outline generation, content editing with media library integration, layout selection using a JSON-driven layout library, and PPTX export with pixel-perfect preview accuracy.

The backend (FastAPI) processes documents using GROBID as a separate service for PDF parsing, sends requests to RunPod for AI inference and continuously queries for results, and generates images through Pexels integration.

For detailed technical documentation, see docs/.

About

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors