Voiceprint

Inspiration

Biometric authentication on Linux isn't new. There's fprint for fingerprint authentication. Problem: not accessible to people without fingerprints, with dirty fingers, carrying things, cooking, etc. There's Howdy for facial recognition. Problem: not accessible in low-light conditions, and sometimes racially biased. Solution: a voice authentication system. We looked, but despite the multitude of speech recognition services offered on- and off-line, there isn't one. Thus, we decided to create our own PAM-based speaker verification system for Linux authentication.

What it does

Enroll: User guided through 5 passphrase recordings that will train a ML net using the GMM model.
Auth: User promoted to speak their passphrase when authenticating (i.e. on login, sudo, lock screen, etc)
Match: User's voice is matched against their personalized model, and if it passes a configurable similarity threshold, the login is granted (PAM_SUCCESS). Otherwise, the user gets 2 more attempts, before their login is denied (PAM_AUTH_ERR) and they will have to try another authentication method (i.e. password, fingerprint).

How we built it

GMM algorithm in Python, PAM module in C, and some interfacing in the middle.

Challenges we ran into

There was no sample code to reference on many of our tasks. Existing GMM Integrations were written in Python2, and we had to figure out how to make one work in Python3. C code doesn't work very well with Python. (We're never ever mixing the two again!) ML models are hard to train and thresholds are fidgety. Working with PAM modules easily bricks your system 🙂

Accomplishments that we're proud of

It works. The concept of logging in on Linux with only our voice and not a password or fingerprint still amazes us. It works with pretty decent results even with only 5 samples to train from, too. Our background-noise-suppression system works really well (important in a room full of chatty hackers!).

What we learned

Never ever ever mix C and Python together and try and make them play nice ever again.

What's next for Voiceprint

While the Gaussian Mixture Model method is sufficient for a proof of concept, it is lacking in real-world applications. Background noise, talking volume, and speaking inconsistencies greatly impact its results. Machine learning models, however, can create highly abstract feature embeddings from phrases, which can lead to text-independent recognition and more. We plan to investigate different ML models to try and improve Voiceprint's accuracy, reliability, and usability.