Skip to content

VeeVargas/record-producers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Identifying Record Producers from Audio Data

Table of Contents

Motivation

Music producers can have a big influence over the sound of an album. This model can be used for two main purposes:

  • Music Discovery: Services like Pandora and Spotify leverage their ability to find music users will like. Production "sound" is another dimension that users may enjoy exploring when searching for new music.
  • Music Publishing: The creation and maintenance of a database of music production and ownership credits has been an historically difficult task. When streaming services pay royalties to record labels often creative collaborators do not get paid properly because of missing documentation. This is a step toward "fingerprinting" the creators of a song.

Data Understanding

Data Sources

Audio Processing

Identifying a record producer lies in the timbre of a sound. Timbre can be thought of as the "quality" or "identity" of a sound. It's what allows us to tell a flute from a trumpet even if they are playing the same notes. Timbre can be found in the higher-frequency overtones of a sound.

Audio mp3 clips 30-seconds long from 1000 songs (10 producers, 100 songs each) were converted to .WAV files and run through a highpass filter to accentuate the timbre frequencies. For each clip, the Mel-Frequency Cepstral Coefficients (MFCCs) were calculated.

MFCCs, very generally, are a set of values that correspond to the timbre of a sound.

More technically, MFCCs are calculated by first taking the Fast Fourier Transform (FFT) of a waveform to convert from amplitude-time space to frequency-time space. Then, each frequency power spectrum of the FFT is treated as its own wavelet and is decomposed further using the Discrete Cosine Transform (DCT). The resulting values are the Mel-Frequency Cepstral Coefficients. The figure below shows an example of the audio processing.

Modeling

After processing, each song has about 24,000 MFCCs (20 in the frequency dimension, 1200 in the time dimension). Principal Component Analysis (PCA) was used to reduce the dimensionality to 12 sonic eigenvectors.

A K-Nearest Neighbors (KNN) algorithm was used to identify the most likely producers for any new song. The figure below shows how an example of how the KNN algorithm works.

Evaluation

The model was tested on a 300-song testing set. The multiclass accuracy for 10 balanced classes of producers was 44% compared to a baseline of 10%.

The data were then plotted on a 2D t-SNE plot to show the relative clustering of songs.

An interactive t-SNE plot can be found here.

Future Improvements

  • Deconvolution of Variables:
    • Artist/Album/Instrumentation
    • More accurate labeling
  • Scale:
    • More songs/producers
    • Parallelize and deploy on AWS/Spark
  • Feature Engineering:
    • More Audio Processing/Reverse Engineering
    • Remove music structure by breaking songs into beats
  • Modeling:
    • Neural Networks with Tensorflow/Keras

Built With

About

Identifying Record Producers from Audio Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages