ESC-50 Environmental Sound Classification Project

ECEN 758 Data Mining and Analysis - Fall 2025

Team Members:

Banghyon Lee (Joseph)
Fahimeh Orvati Nia
Nandhini Valiveti
Sushma Perati

Project Overview

This project implements comprehensive environmental sound classification using the ESC-50 dataset. We explore traditional machine learning methods, deep learning approaches, and transfer learning techniques to classify 50 different environmental sound categories.

Results Website

Live Website: https://fahimehorvatinia.github.io/Group20-Data-Mining-Project/

The website includes interactive visualizations, model performance comparisons, and detailed results analysis.

Dataset

ESC-50 Dataset:

2000 audio recordings (5 seconds each)
50 environmental sound classes
40 examples per class
Pre-arranged into 5 folds for cross-validation

Dataset Visualization

The dataset includes diverse environmental sounds across 5 categories:

Sample audio waveforms from different sound classes

Mel spectrograms showing frequency content over time for different sound classes

3D PCA projection of features showing class separability (explained variance on each axis)

Features

Feature Extraction

Traditional Features (176 features)

MFCC (Mel-frequency cepstral coefficients): 20 features capturing spectral envelope
Chroma features: 12 features representing pitch class information
Mel spectrogram: 128 mel-scaled frequency bands
Spectral contrast: 7 features capturing spectral peaks and valleys
Tonnetz: 6 features representing tonal characteristics

Spectrograms showing time-frequency representations of different sound classes

Transfer Learning Features

YAMNet embeddings: 2048 features per audio file (mean + std pooling of 1024-dim frame embeddings)
- Pre-trained on AudioSet (2 million audio clips)
- Captures high-level audio representations

Preprocessing

Raw features
Normalized features (StandardScaler)
Selected features (Top 100 via Mutual Information)

Models Implemented

Team Member Contributions

Fahimeh Orvati Nia

Naive Bayes Classifier
1D Convolutional Neural Network (CNN)

Banghyon Lee (Joseph)

Random Forest
Gradient Boosting
XGBoost
Transfer Learning with YAMNet + Random Forest
MLP on YAMNet embeddings
Self-Supervised Learning (Pseudo-Labeling)

Nandhini Valiveti

Support Vector Machine (SVM) - Linear kernel
Support Vector Machine (SVM) - RBF kernel
Multi-Layer Perceptron (MLP)

Sushama Perati

Logistic Regression
K-Nearest Neighbors (KNN)

Evaluation Metrics

All models are evaluated using:

Accuracy
Precision (macro-averaged)
Recall (macro-averaged)
F1-Score (macro-averaged)
Confusion Matrices
Learning Curves
Feature Importance (where applicable)

Project Structure

.
├── ESC_50_Complete_Project_All_Methods.ipynb  # Main project notebook
├── Project_Report.tex                          # IEEE format report
├── results.html                                # Results visualization website
├── data_out/                                   # Feature files (CSV)
│   ├── esc50_features_raw.csv
│   ├── esc50_features_normalized_corrected.csv
│   └── esc50_features_selected.csv
├── ESC-50_data/                               # Dataset (audio files excluded from repo)
│   └── meta/                                   # Metadata only
└── README.md                                  # This file

Getting Started

Prerequisites

pip install numpy pandas scikit-learn matplotlib seaborn tensorflow tensorflow-hub librosa xgboost jupyter

Running the Project

Clone this repository:

git clone https://github.com/fahimehorvatinia/Group20-Data-Mining-Project.git
cd Group20-Data-Mining-Project

Download the ESC-50 dataset from here and extract to ESC-50_data/
Open and run ESC_50_Complete_Project_All_Methods.ipynb in Jupyter Notebook
View results in results.html (generated after running the notebook)

Key Features

Unified data splitting ensuring all classes in train/val/test
No data leakage between splits
Comprehensive model comparison
Transfer learning with YAMNet
Self-supervised learning exploration
GPU/CPU compatibility

Results Summary

Top Performing Models

Model	Dataset	Test Accuracy	Test F1
YAMNet+RF (SSL, TH=0.90)	YAMNet Embeddings	83.75%	83.15%
YAMNet + Random Forest	YAMNet Embeddings	83.44%	82.95%
YAMNet+RF (SSL, TH=0.85)	YAMNet Embeddings	82.81%	82.24%
YAMNet + MLP	YAMNet Embeddings	81.25%	80.55%
Random Forest	Normalized	46.75%	44.81%
XGBoost	Raw	44.50%	43.41%
CNN	Normalized	38.25%	36.50%
MLP	Normalized	33.50%	32.10%

Key Findings

Transfer Learning Dominance: YAMNet embeddings achieve 83-84% test accuracy vs. 16-47% for traditional features, representing a 35+ percentage point improvement
Best Traditional Model: Random Forest achieves 46.8% accuracy on normalized features
Feature Preprocessing: Normalization improves average accuracy from 28% to 31% for traditional classifiers
Self-Supervised Learning: Pseudo-labeling provides 0.3-0.4% improvements over baseline YAMNet+RF

See the Results Website for detailed performance comparisons across all models and feature sets.

Report

The project report is available in Project_Report.tex (IEEE conference format).

License

This project is for educational purposes as part of ECEN 758 Data Mining and Analysis course.

Acknowledgments

ESC-50 dataset creators
TensorFlow Hub for YAMNet model
scikit-learn, TensorFlow, and other open-source libraries

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
ESC-50_data		ESC-50_data
data_out		data_out
.gitignore		.gitignore
.nojekyll		.nojekyll
Data_Mining_Project_Feature_Extraction_and_Feature_Selection (1).ipynb		Data_Mining_Project_Feature_Extraction_and_Feature_Selection (1).ipynb
Data_Mining_Project_Feature_Extraction_and_Feature_Selection (2).ipynb		Data_Mining_Project_Feature_Extraction_and_Feature_Selection (2).ipynb
ESC_50_Complete_Project_All_Methods.ipynb		ESC_50_Complete_Project_All_Methods.ipynb
ESC_50_Env_Sound_Classification (1).ipynb		ESC_50_Env_Sound_Classification (1).ipynb
ESC_50_Env_Sound_Classification.ipynb		ESC_50_Env_Sound_Classification.ipynb
GPU_Fix_Code.txt		GPU_Fix_Code.txt
README.md		README.md
extract_results.py		extract_results.py
fix_feature_selection.py		fix_feature_selection.py
index.html		index.html
results.html		results.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESC-50 Environmental Sound Classification Project

Project Overview

Results Website

Dataset

Dataset Visualization

Features

Feature Extraction

Traditional Features (176 features)

Transfer Learning Features

Preprocessing

Models Implemented

Team Member Contributions

Fahimeh Orvati Nia

Banghyon Lee (Joseph)

Nandhini Valiveti

Sushama Perati

Evaluation Metrics

Project Structure

Getting Started

Prerequisites

Running the Project

Key Features

Results Summary

Top Performing Models

Key Findings

Report

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ESC-50 Environmental Sound Classification Project

Project Overview

Results Website

Dataset

Dataset Visualization

Features

Feature Extraction

Traditional Features (176 features)

Transfer Learning Features

Preprocessing

Models Implemented

Team Member Contributions

Fahimeh Orvati Nia

Banghyon Lee (Joseph)

Nandhini Valiveti

Sushama Perati

Evaluation Metrics

Project Structure

Getting Started

Prerequisites

Running the Project

Key Features

Results Summary

Top Performing Models

Key Findings

Report

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages