This repository provides inference scripts for French SSML cascade models that improve synthetic speech quality through intelligent prosody control.
🔗 Demo & Audio Samples: https://horstmann.tech/ssml-prosody-control/
📓 Interactive Colab: https://colab.research.google.com/drive/1bFcbJQY9OuY0_zlscqkf9PIgd3dUrIKs?usp=sharing
The cascade consists of two specialized LoRA models based on Qwen2.5-7B with a programmatic processing step:
- hi-paris/ssml-text2breaks-fr-lora: Predicts natural pause locations in French text
- Empty SSML Creation (programmatic): Converts simple breaks to structured SSML templates
- hi-paris/ssml-breaks2ssml-fr-lora: Fills templates with appropriate prosody values
Plain Text → Text2Breaks → Simple Breaks → Programmatic → Empty SSML → Breaks2SSML → Full SSML
"Bonjour" → "Bonjour<break/>" → "<prosody...>Bonjour</prosody><break time="_ms"/>" → "Bonjour<break time="250ms"/>"
Using uv (recommended):
git clone <repository-url>
cd cascading_model
uv syncUsing pip:
git clone <repository-url>
cd cascading_model
pip install -r requirements.txtfrom breaks2ssml_inference import CascadedInference
# Initialize cascade (memory efficient - single base model)
cascade = CascadedInference()
# Convert plain text to SSML
text = "Bonjour comment allez-vous aujourd'hui ?"
result = cascade.predict(text)
print(result) # Full SSML with prosody and break tags# Step 1: Text to breaks
from text2breaks_inference import Text2BreaksInference
text2breaks = Text2BreaksInference()
breaks_result = text2breaks.predict("Bonjour comment allez-vous aujourd'hui ?")
# Step 2: Breaks to SSML
from breaks2ssml_inference import Breaks2SSMLInference
breaks2ssml = Breaks2SSMLInference()
ssml_result = breaks2ssml.predict(breaks_result)# Run examples
python demo.py
# Interactive mode
python demo.py --interactive
# Test installation
python test_models.py# Full cascade
python breaks2ssml_inference.py --cascade "Bonjour comment allez-vous ?"
# Individual models
python text2breaks_inference.py "Bonjour comment allez-vous ?"
python breaks2ssml_inference.py "Bonjour comment allez-vous ?<break/>"
# Interactive modes
python text2breaks_inference.py --interactive
python breaks2ssml_inference.py --interactive --cascadeInput text:
Bonjour comment allez-vous aujourd'hui ?
Step 1 - Text-to-Breaks:
Bonjour comment allez-vous aujourd'hui ?<break/>
Step 2 - Empty SSML Creation (programmatic):
<prosody pitch="_%" rate="_%" volume="_%">Bonjour comment allez-vous aujourd'hui ?</prosody><break time="_ms"/>
Step 3 - Breaks-to-SSML:
<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous aujourd'hui ?</prosody><break time="300ms"/>
# More deterministic output
result = model.predict(text, temperature=0.1, do_sample=False)
# More varied output
result = model.predict(text, temperature=0.8, do_sample=True)
# Control output length
result = model.predict(text, max_new_tokens=256)# Specific GPU
model = Text2BreaksInference(device="cuda:0")
# CPU only
model = Text2BreaksInference(device="cpu")
# Auto-detect (default)
model = Text2BreaksInference(device="auto")- Base Model: Qwen/Qwen2.5-7B with LoRA adapters
- Memory Efficiency: Shared base model with swappable adapters (50% memory savings)
- Language: French
- License: Apache 2.0
- Use Cases: TTS prosody control, speech synthesis, accessibility tools
If you use these models in your research, please cite:
@inproceedings{ould-ouali2025_improving,
title = {Improving Synthetic Speech Quality via SSML Prosody Control},
author = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)},
year = {2025},
url = {https://huggingface.co/hi-paris}
}This repository is maintained by Hi! Paris. For issues or questions about the models, please open an issue on the respective HuggingFace model pages.
Apache 2.0 License (same as the base Qwen2.5-7B model)