A demo-ready project that generates highlights and a semantic summary from any video (sports, lectures, generic).
- Upload a video (MP4/MOV) or provide a YouTube link
- Picks key segments using a multi-signal scoring pipeline:
- Scene detection (PySceneDetect)
- Audio energy peaks (librosa)
- Semantic importance from transcript embeddings (Sentence‑BERT)
- Exports:
highlights.mp4timestamps.json(why each moment was chosen)- a readable summary (bullets + paragraph)
- Python 3.10+ recommended
- ffmpeg installed and available in PATH
Check:
ffmpeg -versionpython -m venv .venv
# mac/linux
source .venv/bin/activate
# windows (powershell)
# .venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r requirements.txtFirst run will download pretrained models (Whisper + Sentence‑BERT). That’s expected.
streamlit run app.pypython -m pipeline.cli --video "path/to/video.mp4" --mode lecture --target_seconds 75Outputs go to: outputs/<run_id>/
- If export fails: ensure
ffmpegis installed. - If transcription is slow: choose
tinymodel in the UI. - If video has no speech: the semantic module falls back and uses audio/visual cues.