VoID | Devpost

Inspiration

There has been an explosion in content about voice synthesizers in the music industry, and also a growing security concern about the deepfakes in video and audio. Seeing this, I asked myself, if the cloning model can understand what a person voice sounds like by constructing a clone of it, then surely, we can reverse engineer it.

What it does

Takes in audio input and gives an identity prediction (~85% accuracy rate)

How we built it

Built a classifier that processes audio into an image of a mel spectogram and trains the image of the spec represention with a label of the person who said the audio as an image classifier

Challenges we ran into

There were plenty of issues in building a model and dataset from scratch with overfitting and accuracy. We also ran into alot of infrastructure issues in submitting audio over the web -> API.

Accomplishments that we're proud of

Building an entire model with 85%+ accuracy with no experience in doing all of this in scratch

What we learned

Model engineering practices, it is so important to build in an iterative manner instead of just throwing all of your data at it.

What's next for VoID

Speech embeddings and see doc, there are so many other implementations for voice identity! (sorry only 1 min left to post) Such as...
Security
Dev Tools
Accessibility
Audio Discovery
Language

Built With

fastapi
pytorch
react

Updates

Aman Ibrahim started this project — May 17, 2023 08:02 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.