PaddleAudio is the audio toolkit to speed up your audio research and development loop in PaddlePaddle. It currently provides a collection of audio datasets, feature-extraction functions, audio transforms,state-of-the-art pre-trained models in sound tagging/classification and anomaly sound detection. More models and features are on the roadmap.
- Spectrogram and related features are compatible with librosa.
- State-of-the-art models in sound tagging on Audioset, sound classification on esc50, and more to come.
- Ready-to-use audio embedding with a line of code, includes sound embedding and more on the roadmap.
- Data loading supports for common open source audio in multiple languages including English, Mandarin and so on.
git clone https://github.com/PaddlePaddle/models
cd models/PaddleAudio
pip install .
import paddleaudio
audio_file = 'test.flac'
wav, sr = paddleaudio.load(audio_file, sr=16000)
mel_feature = paddleaudio.melspectrogram(wav,
sr=sr,
window_size=320,
hop_length=160,
n_mels=80)import paddleaudio
from paddleaudio.models.wav2vec2 import Wav2Vec2ForCTC, Wav2Vec2Tokenizer
model = Wav2Vec2ForCTC('wav2vec2-base-960h', pretrained=True)
tokenizer = Wav2Vec2Tokenizer()
# Load audio and normalize
wav, _ = paddleaudio.load('your_audio.wav', sr=16000, normal=True, norm_type='gaussian')
with paddle.no_grad():
x = paddle.to_tensor(wav)
logits = model(x.unsqueeze(0))
# Get the token index prediction
idx = paddle.argmax(logits, -1)
# Decode prediction to text
text = tokenizer.decode(idx[0])
print(text)We provide a set of examples to help you get started in using PaddleAudio quickly.
- Wav2vec 2.0 for speech recognition
- PANNs: acoustic scene and events analysis using pre-trained models
- Environmental Sound classification on ESC-50 dataset
- Training a audio-tagging network on Audioset
Please refer to example directory for more details.