About This Project

Project Overview

MVHST (Multi-View Harmonic Spectrum Transformer) is an advanced AI-powered drone classification system that identifies and categorizes drones by analyzing their unique acoustic signatures. The system can distinguish between 10 different drone types (A-J) from audio recordings alone.

What It Does

Upload audio files (WAV, MP3, M4A, etc.)
Analyze acoustic features using multi-view deep learning
Classify drones into 10 categories (A-J)
Display probability scores for all drone types in an interactive radar chart
Provide real-time predictions with confidence scores

Key Features

🎯 Multi-View Analysis

Extracts three complementary views: Mel Spectrogram, CQT, and Harmonic Features
Each view captures different aspects of the acoustic signature
Intelligent fusion combines insights from all views

🤖 Deep Learning Architecture

Transformer-based neural network
6 layers with 8 attention heads per view
Learns complex temporal and frequency patterns

📊 Interactive Visualization

Real-time radar chart showing probabilities for all 10 drone types
Visual feedback on prediction confidence
Audio waveform and spectrogram displays

⚡ Real-Time Processing

Fast inference on uploaded audio files
Automatic audio preprocessing and standardization
Instant classification results

Technology Stack

Backend

Python with Flask framework
PyTorch for deep learning
Librosa & TorchAudio for audio processing
Transformers architecture for pattern recognition

Frontend

React with Vite
Canvas API for visualizations
Modern UI/UX with responsive design

Model Architecture

MVHST Model: Custom Multi-View Transformer
Feature Extractors: Mel, CQT, Harmonic
Fusion Layer: Attention-based view combination
Classifier: Neural network with 10 output classes

How It Works

Audio Input: User uploads audio file through web interface
Preprocessing: Audio standardized to 22050 Hz, mono, 5 seconds
Feature Extraction: Three views extracted simultaneously
Deep Processing: Each view processed through transformer encoder
Intelligent Fusion: Views combined using attention mechanism
Classification: Neural network outputs probabilities for all 10 classes
Visualization: Results displayed in interactive radar chart

Use Cases

Security & Surveillance: Identify drone types from audio recordings
Research & Analysis: Classify drone acoustic signatures
Airspace Monitoring: Automated drone detection systems
Educational: Learn about audio classification and deep learning

Project Structure

MV-HST/
├── backend.py              # Flask server with MVHST model
├── models/                 # Neural network architectures
│   ├── fusion_model.py     # Main MVHST model
│   └── transformer_encoder.py
├── features/               # Feature extractors
│   ├── mel_extractor.py
│   ├── cqt_extractor.py
│   └── harmonic_features.py
├── utils/                  # Helper functions
├── config.py              # Configuration settings
├── train.py               # Training script
└── checkpoints/           # Trained model weights

Model Performance

Architecture: Multi-View Transformer with 3 feature views
Parameters: ~2.5M trainable parameters
Input: 5-second audio clips at 22050 Hz
Output: 10-class probability distribution
Accuracy: Trained on custom drone dataset

Key Innovations

Multi-View Approach: First to combine Mel, CQT, and Harmonic features for drone classification
View Attention: Intelligent fusion mechanism that adapts to different audio characteristics
Transformer Architecture: Leverages state-of-the-art attention mechanisms for temporal pattern recognition
End-to-End Learning: Entire pipeline trained jointly for optimal performance

Future Enhancements

Real-time streaming audio analysis
Support for more drone types
Improved noise robustness
Mobile app integration
Cloud deployment options

Technical Highlights

Sample Rate: 22050 Hz (captures frequencies up to 11 kHz)
Duration: 5 seconds per analysis
Feature Dimensions:
- Mel: 128 frequency bins
- CQT: 84 bins
- Harmonic: 5 bins
Model Dimension: 256
Attention Heads: 8 per transformer layer
Layers: 6 transformer layers per view

Why This Matters

Drones are increasingly common in airspace, and identifying them quickly and accurately is crucial for:

Security: Distinguishing between different drone types
Safety: Understanding drone capabilities from audio alone
Research: Advancing acoustic classification techniques
Innovation: Pushing boundaries of audio-based AI

Contact & Contribution

This project demonstrates advanced techniques in:

Audio signal processing
Deep learning for classification
Multi-view feature fusion
Transformer architectures
Real-time web applications

Summary

MVHST is a cutting-edge AI system that uses advanced deep learning to classify drones from audio recordings. By combining multiple acoustic perspectives with transformer-based neural networks, it achieves accurate classification across 10 drone types with real-time performance and intuitive visualization.

Tagline: "AI-Powered Drone Detection Through Acoustic Analysis"