Hyper RVC WebUI

An autonomous pipeline to create covers with any RVC v2 trained AI voice from YouTube videos or a local audio file. For developers who may want to add a singing functionality into their AI assistant/chatbot/vtuber, or for people who want to hear their favourite characters sing their favourite song.

Project Structure

Hyper-RVC/
├── app.py                      # Main WebUI entry point (Gradio)
├── main.py                     # Legacy entry point (redirects to app.py)
├── core.py                     # Backward-compatible shim (re-exports from main)
├── cli.py                      # CLI interface
│
├── main/                       # Core processing modules
│   ├── __init__.py             # Package init + audioop 3.13 shim
│   ├── core.py                 # Pipeline orchestrator (full_inference_program)
│   │
│   ├── uvr/                    # Audio separation (vocal, karaoke, dereverb, deecho, denoise)
│   │   ├── __init__.py
│   │   ├── separator.py        # High-level separation functions
│   │   └── models/             # Separation model architectures
│   │       ├── bs_roformer/    # BS-Roformer & Mel-Band-Roformer
│   │       ├── bandit/         # Band-Split RNN v1
│   │       ├── bandit_v2/      # Band-Split RNN v2
│   │       ├── scnet/          # SCNet
│   │       ├── scnet_unofficial/ # SCNet unofficial variant
│   │       ├── demucs4ht.py    # Demucs v4 hybrid transformer
│   │       ├── mdx23c_tfc_tdf_v3.py # MDX23C TFC-TDF
│   │       ├── segm_models.py  # Segmentation models
│   │       ├── torchseg_models.py # Torch segmentation models
│   │       ├── upernet_swin_transformers.py # UperNet Swin
│   │       ├── ensemble.py     # Model ensembling
│   │       ├── inference.py    # Inference engine
│   │       └── utils.py        # Audio processing utilities
│   │
│   ├── rvc/                    # RVC voice conversion
│   │   ├── __init__.py
│   │   ├── converter.py        # High-level RVC conversion wrapper
│   │   └── engine/             # Applio RVC inference engine
│   │       ├── configs/config.py
│   │       ├── infer/infer.py
│   │       ├── infer/pipeline.py
│   │       └── lib/
│   │           ├── algorithm/  # Neural network architectures (11 modules)
│   │           ├── predictors/ # F0 extractors (CREPE, FCPE, RMVPE)
│   │           ├── utils.py
│   │           └── tools/      # Model download, TTS, audio split
│   │
│   ├── tts/                    # Text-to-Speech (Edge TTS + RVC)
│   │   ├── __init__.py
│   │   └── synthesis.py        # TTS generation + optional RVC conversion
│   │
│   ├── whisper/                # Whisper transcription
│   │   ├── __init__.py
│   │   ├── transcriber.py      # High-level transcription wrapper
│   │   └── diarization/        # Speaker diarization engine
│   │       ├── whisper.py      # Whisper model wrapper
│   │       ├── speechbrain.py  # SpeechBrain integration
│   │       ├── ECAPA_TDNN.py   # Speaker embedding model
│   │       ├── encoder.py      # Speaker encoder
│   │       ├── features.py     # Audio feature extraction
│   │       ├── segment.py      # Voice activity segmentation
│   │       ├── embedding.py    # Speaker embeddings
│   │       ├── audio.py        # Audio preprocessing
│   │       └── parameter_transfer.py # Model weight transfer
│   │
│   └── tools/                  # Shared utilities
│       ├── __init__.py
│       ├── variables.py        # Model definitions, FP16 config
│       ├── config.py           # Application configuration management
│       ├── file_utils.py       # File search, model lookup, downloads
│       ├── audio_utils.py      # Audio effects (Pedalboard), merging (pydub)
│       ├── downloader.py       # Model & music download orchestration
│       ├── gdown.py            # Google Drive download handler
│       ├── hf.py               # HuggingFace download handler
│       ├── mediafire.py        # MediaFire download handler
│       └── logger.py           # Logging utilities
│
├── tabs/                       # Gradio UI tabs
│   ├── full_inference.py       # Voice Conversion tab
│   ├── tts_inference.py        # TTS Generation tab
│   ├── whisper_transcription.py # Transcription tab
│   ├── download_music.py       # Download Music tab
│   ├── download_model.py       # Download Model tab
│   └── settings.py             # Settings tab
│
├── assets/                     # Static assets
│   ├── themes/                 # Gradio themes
│   ├── i18n/                   # Internationalization (8 languages)
│   ├── config.json             # User settings
│   ├── logo.ico                # Favicon
│   └── colab.ipynb             # Google Colab notebook
│
├── docs/                       # Documentation
├── tests/                      # Test suite
├── requirements.txt
├── run.sh / run.bat            # Launch scripts
└── update.sh / update.bat      # Update scripts

Quick Start

# Install dependencies
pip install -r requirements.txt

# Start the WebUI
python app.py

# With custom options
python app.py --port 8080 --share --open

CLI Usage

List available models

python cli.py list-models

Download a model

python cli.py download-model --link https://huggingface.co/username/model

Download music from YouTube

python cli.py download-music --link https://youtube.com/watch?v=...

Basic audio conversion

python cli.py convert --model-path /path/to/model.pth --input-audio song.mp3

Full conversion with all options

python cli.py convert --model-path model.pth --index-path index.pth \
  --input-audio song.mp3 --pitch 12 --reverb --denoise \
  --vocal-model "Mel-Roformer by KimberleyJSN" \
  --export-format-final mp3

Add Effect

python cli.py add-effects input.wav --room-size 0.8 --wet 0.4 --output-path output.wav

Merge audio files

python cli.py merge \
  --vocals vocals.flac \
  --instrumental instrumental.flac \
  --backing-vocals backing.flac \
  --format mp3

Module Overview

`main/uvr/` — Audio Separation

Handles all audio source separation tasks using state-of-the-art deep learning models including Mel-Roformer, BS-Roformer, MDX23C, Demucs v4, Bandit-Split RNN, and SCNet architectures:

Vocal/instrumental separation
Karaoke (lead + backing vocal) separation
Dereverb processing
Deecho processing
Denoise processing
Model ensembling for improved separation quality

`main/rvc/` — Voice Conversion

Wraps the Applio RVC inference engine for high-quality voice conversion with support for multiple pitch extractors (CREPE, FCPE, RMVPE), embedder models, and various export formats. The engine includes a full pipeline architecture with attention-based generators, discriminators, and synthesizer modules.

`main/tts/` — Text-to-Speech

Microsoft Edge TTS integration with 400+ voices across 11 languages, with optional RVC voice conversion on the generated audio for creating AI covers from text input alone.

`main/whisper/` — Transcription & Diarization

OpenAI Whisper-based speech-to-text with word-level timestamps, multi-language support, and speaker diarization powered by SpeechBrain and ECAPA-TDNN speaker embeddings. Supports SRT, VTT, and JSON export formats.

`main/tools/` — Utilities

Shared helpers used across all modules:

variables: Model definitions, FP16 hardware detection
config: Application configuration management
file_utils: File search, model metadata lookup, file downloads
audio_utils: Reverb effects (Pedalboard), audio merging (pydub), FP16 config patching
downloader: RVC model download and YouTube music download orchestration
gdown / hf / mediafire: Platform-specific download handlers

`main/core.py` — Pipeline Orchestrator

The full_inference_program() function coordinates the complete audio processing pipeline by calling into the specialized sub-modules in sequence: vocal separation → karaoke separation → dereverb → deecho → denoise → RVC conversion → backing vocals → reverb → pitch adjust → merge.

Cloud Usage

Credits

👑 Project Team

Role	Member	Description
👑 Base Project Owner	ShiromiyaG	Owner of RVC-AI-Cover-Maker-UI which this project is based on
🔧 Base Project Contributor	Eddycrack864	Contributor to RVC-AI-Cover-Maker-UI
🧩 Fork Owner	BF667-IDLE	Hyper RVC fork owner & maintainer
🧪 Colab UI	Nick088	Start UI cells in Colab & Kaggle, local setup guide
🧪 QA Testing	FullmatheusBallZ	Google Colab testing & quality assurance

🏗️ Core Projects & Libraries

Project	Author	Role
	ShiromiyaG	Original UI framework & cover pipeline design (owned by ShiromiyaG)
	IAHispano	RVC inference engine, pitch extraction & model management
	beveradb	Python audio source separation wrapping UVR models
	Anjok07	Gold standard vocal removal with pretrained model weights
	ZFTurbo	BS-Roformer, Mel-Band-Roformer, SCNet, MDX23C, Bandit, Demucs
	SociallyIneptWeeb	AI cover generation pipeline & processing concepts
	PhamHuynhAnh16	Base RVC library code, additional F0 predictors & method fixes

🧠 AI Models & Frameworks

Library	Author	Purpose
	OpenAI	Speech recognition & transcription
	SpeechBrain Team	Speaker diarization & ECAPA-TDNN embeddings
	Meta AI	Deep learning framework for all neural networks
	HuggingFace	Model loading & pretrained model utilities
	NumPy Team	Numerical computing & array operations
	Microsoft	High-performance model inference

🎵 Voice & Pitch Extraction

Library	Author	Purpose
	rany2	400+ voices in 11 languages via Microsoft Edge
	Max Morrison	Neural pitch estimation (F0 extraction)
	OpenVPI	Robust vocal pitch estimation
	SCToolsystem	Fundamental frequency contour extraction
	Meta Research	Voice embedding similarity search & retrieval
	Max Morrison	PyTorch-native CREPE implementation

🎧 Audio Processing

Library	Author	Purpose
	Spotify	Studio-quality reverb, EQ & audio effects
	James Robert	Audio manipulation, format conversion & merging
	librosa Team	Music & audio analysis, feature extraction
	FFmpeg Project	Audio/video encoding, decoding & processing
	Bastian Bechtold	Audio file I/O via libsndfile
	SciPy Team	Signal processing & scientific computing

📥 Download & Network

Library	Author	Purpose
	yt-dlp contributors	YouTube & 1000+ site audio/video downloader
	HuggingFace	Model & dataset hosting for pretrained RVC models
	Kentaro Wada	Google Drive file downloader
	Kenneth Reitz	HTTP library for Python
	Casper da Costa-Luis	Progress bars for downloads & processing

🖼️ UI & Design

Library	Author	Purpose
	HuggingFace	Web UI framework with tabs, sliders & file uploads
	Python Software Foundation	Core language runtime
	Freepik	Cyber-themed cover image for the WebUI

Built with ❤️ by the Hyper-RVC community · Open Source under MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyper RVC WebUI

Project Structure

Quick Start

CLI Usage

List available models

Download a model

Download music from YouTube

Basic audio conversion

Full conversion with all options

Add Effect

Merge audio files

Module Overview

`main/uvr/` — Audio Separation

`main/rvc/` — Voice Conversion

`main/tts/` — Text-to-Speech

`main/whisper/` — Transcription & Diarization

`main/tools/` — Utilities

`main/core.py` — Pipeline Orchestrator

Cloud Usage

Credits

👑 Project Team

🏗️ Core Projects & Libraries

🧠 AI Models & Frameworks

🎵 Voice & Pitch Extraction

🎧 Audio Processing

📥 Download & Network

🖼️ UI & Design

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 303 Commits
assets		assets
docs		docs
logs		logs
main		main
tabs		tabs
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cli.py		cli.py
core.py		core.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh
update.bat		update.bat
update.sh		update.sh

Folders and files

Latest commit

History

Repository files navigation

Hyper RVC WebUI

Project Structure

Quick Start

CLI Usage

List available models

Download a model

Download music from YouTube

Basic audio conversion

Full conversion with all options

Add Effect

Merge audio files

Module Overview

main/uvr/ — Audio Separation

main/rvc/ — Voice Conversion

main/tts/ — Text-to-Speech

main/whisper/ — Transcription & Diarization

main/tools/ — Utilities

main/core.py — Pipeline Orchestrator

Cloud Usage

Credits

👑 Project Team

🏗️ Core Projects & Libraries

🧠 AI Models & Frameworks

🎵 Voice & Pitch Extraction

🎧 Audio Processing

📥 Download & Network

🖼️ UI & Design

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`main/uvr/` — Audio Separation

`main/rvc/` — Voice Conversion

`main/tts/` — Text-to-Speech

`main/whisper/` — Transcription & Diarization

`main/tools/` — Utilities

`main/core.py` — Pipeline Orchestrator