Identifying Record Producers from Audio Data

Motivation

Music producers can have a big influence over the sound of an album. This model can be used for two main purposes:

Music Discovery: Services like Pandora and Spotify leverage their ability to find music users will like. Production "sound" is another dimension that users may enjoy exploring when searching for new music.
Music Publishing: The creation and maintenance of a database of music production and ownership credits has been an historically difficult task. When streaming services pay royalties to record labels often creative collaborators do not get paid properly because of missing documentation. This is a step toward "fingerprinting" the creators of a song.

Data Understanding

Data Sources

Spotify API - Contains audio files and song metadata.
Wikipedia - Record producer labeling.

Audio Processing

Identifying a record producer lies in the timbre of a sound. Timbre can be thought of as the "quality" or "identity" of a sound. It's what allows us to tell a flute from a trumpet even if they are playing the same notes. Timbre can be found in the higher-frequency overtones of a sound.

Audio mp3 clips 30-seconds long from 1000 songs (10 producers, 100 songs each) were converted to .WAV files and run through a highpass filter to accentuate the timbre frequencies. For each clip, the Mel-Frequency Cepstral Coefficients (MFCCs) were calculated.

MFCCs, very generally, are a set of values that correspond to the timbre of a sound.

More technically, MFCCs are calculated by first taking the Fast Fourier Transform (FFT) of a waveform to convert from amplitude-time space to frequency-time space. Then, each frequency power spectrum of the FFT is treated as its own wavelet and is decomposed further using the Discrete Cosine Transform (DCT). The resulting values are the Mel-Frequency Cepstral Coefficients. The figure below shows an example of the audio processing.

Modeling

After processing, each song has about 24,000 MFCCs (20 in the frequency dimension, 1200 in the time dimension). Principal Component Analysis (PCA) was used to reduce the dimensionality to 12 sonic eigenvectors.

A K-Nearest Neighbors (KNN) algorithm was used to identify the most likely producers for any new song. The figure below shows how an example of how the KNN algorithm works.

Evaluation

The model was tested on a 300-song testing set. The multiclass accuracy for 10 balanced classes of producers was 44% compared to a baseline of 10%.

The data were then plotted on a 2D t-SNE plot to show the relative clustering of songs.

An interactive t-SNE plot can be found here.

Future Improvements

Deconvolution of Variables:
- Artist/Album/Instrumentation
- More accurate labeling
Scale:
- More songs/producers
- Parallelize and deploy on AWS/Spark
Feature Engineering:
- More Audio Processing/Reverse Engineering
- Remove music structure by breaking songs into beats
Modeling:
- Neural Networks with Tensorflow/Keras

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
AdditionalFiles		AdditionalFiles
archive		archive
images		images
pv_website		pv_website
src		src
README.md		README.md
album_scraper.py		album_scraper.py
album_scraping.ipynb		album_scraping.ipynb
audio.wav		audio.wav
audio_filtering.ipynb		audio_filtering.ipynb
beat_times.csv		beat_times.csv
data_cleaning.ipynb		data_cleaning.ipynb
data_pulling.ipynb		data_pulling.ipynb
file.wav		file.wav
first_model.ipynb		first_model.ipynb
knn_genre_classification.ipynb		knn_genre_classification.ipynb
live_demo.ipynb		live_demo.ipynb
load_multiple_files.ipynb		load_multiple_files.ipynb
load_producers.ipynb		load_producers.ipynb
mfcc_nn_model.ipynb		mfcc_nn_model.ipynb
model_for_web.ipynb		model_for_web.ipynb
mp3.mp3		mp3.mp3
poster.pdf		poster.pdf
signal_processing.ipynb		signal_processing.ipynb
temp.mp3		temp.mp3
temp.wav		temp.wav
wav.mp3		wav.mp3
wav.wav		wav.wav
wikipedia_scraping.ipynb		wikipedia_scraping.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Record Producers from Audio Data

Table of Contents

Motivation

Data Understanding

Data Sources

Audio Processing

Modeling

Evaluation

Future Improvements

Built With

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Identifying Record Producers from Audio Data

Table of Contents

Motivation

Data Understanding

Data Sources

Audio Processing

Modeling

Evaluation

Future Improvements

Built With

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages