Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
UC Santa Barbara
UC Santa Cruz
Uniphore
EvoPresent unifies coherent storytelling, aesthetic-aware slide design, and lifelike talking-head delivery. A dedicated Checker Agent, powered by the PresAesth multi-task reinforcement learning model, critiques every draft and guides iterative self-improvement, enabling reliable and engaging academic presentations from only raw paper materials.
Overview
The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle with limited storytelling, insufficient aesthetic quality, and constrained self-adjustment. EvoPresent addresses these limitations with a self-improvement agent that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is PresAesth, a multi-task reinforcement learning aesthetic model that supplies reliable scoring, defect adjustment, and comparative feedback so the agent can iteratively refine its output even with limited training data. To systematically evaluate end-to-end systems, we also introduce the EvoPresent Benchmark, featuring (i) Presentation Generation Quality, covering 650 top-tier AI conference papers with multimodal resources, and (ii) Aesthetic Awareness, containing 2,000 slide pairs with varying quality to support joint training and evaluation.
High-quality feedback matters
Strong initial capability alone cannot guarantee effective self-correction; the PresAesth critic is essential for meaningful improvements.
Design vs. content trade-off
Automated pipelines often sacrifice layout polish for factual coverage. EvoPresent balances both objectives within a single agent loop.
Multi-task RL generalization
Training PresAesth jointly on scoring, defect adjustment, and pairwise comparison yields better aesthetic awareness than single-task baselines.
💻 EvoPresent Agent Pipeline
Overview of the EvoPresent framework. (a) EvoPresent first performs content extraction and voice generation, then constructs the storyline and script, followed by content enhancement using image generation and knowledge retrieval. Design and rendering are handled next, and the aesthetic checker evaluates the initial slide and provides adjustments. (b) PresAesth is trained on a human-preference aesthetic dataset via multiple tasks (scoring, defect adjustment, and comparison). (c) The PresAesth model guides the agent framework in iterative self-improvement.
The complete agent loop stage-by-stage.
✨ Aesthetic Judgement for Self-Improvement
Checker Agent powered by PresAesth.
EvoPresent's high-quality output is driven by an iterative "draft → feedback → refinement" cycle supervised by a dedicated Checker Agent. The checker leverages PresAesth, a multi-task aesthetic model trained with Multi-Task Group Policy Optimization (GRPO) on human preference data. A single pass through PresAesth yields an absolute aesthetic score, identifies concrete defects (e.g., layout imbalance or typography issues), and compares competing slide candidates. This compound signal unlocks self-directed improvements without relying on expensive human-in-the-loop reviews.
EvoPresent Benchmark
Evaluation protocol spanning generation quality and aesthetic awareness.
The EvoPresent Benchmark offers a comprehensive suite for evaluating both presentation generation and aesthetic models. Its data sources are twofold: curated materials from top-tier AI conferences (slides, videos, and scripts) and a specialized dataset of paired slides with varying aesthetic quality. Correspondingly, its evaluation metrics assess (i) content fidelity and design quality measured against conference materials, and (ii) the model's capabilities in absolute scoring, defect identification, and pairwise comparison using the paired aesthetic slides. This structure enables rigorous and reproducible evaluation for both content generation and aesthetic judgment.
🎨 Aesthetic Comparison
Side-by-side comparisons highlight PresAesth guidance on layout balance, typography, and iconography.
Interactive Demos
Nine paired demos showcase slide interactivity and synchronized talking-head delivery. Each video is generated directly from the EvoPresent pipeline using the same script as the interactive deck.
BibTeX
@misc{liu2025presentingpaperartselfimprovement,
title = {Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations},
author = {Chengzhi Liu and Yuzhe Yang and Kaiwen Zhou and Zhen Zhang and Yue Fan and Yanan Xie and Peng Qi and Xin Eric Wang},
year = {2025},
eprint = {2510.05571},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2510.05571}
}