About This Project
Project Overview
MVHST (Multi-View Harmonic Spectrum Transformer) is an advanced AI-powered drone classification system that identifies and categorizes drones by analyzing their unique acoustic signatures. The system can distinguish between 10 different drone types (A-J) from audio recordings alone.
What It Does
- Upload audio files (WAV, MP3, M4A, etc.)
- Analyze acoustic features using multi-view deep learning
- Classify drones into 10 categories (A-J)
- Display probability scores for all drone types in an interactive radar chart
- Provide real-time predictions with confidence scores
Key Features
🎯 Multi-View Analysis
- Extracts three complementary views: Mel Spectrogram, CQT, and Harmonic Features
- Each view captures different aspects of the acoustic signature
- Intelligent fusion combines insights from all views
🤖 Deep Learning Architecture
- Transformer-based neural network
- 6 layers with 8 attention heads per view
- Learns complex temporal and frequency patterns
📊 Interactive Visualization
- Real-time radar chart showing probabilities for all 10 drone types
- Visual feedback on prediction confidence
- Audio waveform and spectrogram displays
⚡ Real-Time Processing
- Fast inference on uploaded audio files
- Automatic audio preprocessing and standardization
- Instant classification results
Technology Stack
Backend
- Python with Flask framework
- PyTorch for deep learning
- Librosa & TorchAudio for audio processing
- Transformers architecture for pattern recognition
Frontend
- React with Vite
- Canvas API for visualizations
- Modern UI/UX with responsive design
Model Architecture
- MVHST Model: Custom Multi-View Transformer
- Feature Extractors: Mel, CQT, Harmonic
- Fusion Layer: Attention-based view combination
- Classifier: Neural network with 10 output classes
How It Works
- Audio Input: User uploads audio file through web interface
- Preprocessing: Audio standardized to 22050 Hz, mono, 5 seconds
- Feature Extraction: Three views extracted simultaneously
- Deep Processing: Each view processed through transformer encoder
- Intelligent Fusion: Views combined using attention mechanism
- Classification: Neural network outputs probabilities for all 10 classes
- Visualization: Results displayed in interactive radar chart
Use Cases
- Security & Surveillance: Identify drone types from audio recordings
- Research & Analysis: Classify drone acoustic signatures
- Airspace Monitoring: Automated drone detection systems
- Educational: Learn about audio classification and deep learning
Project Structure
MV-HST/
├── backend.py # Flask server with MVHST model
├── models/ # Neural network architectures
│ ├── fusion_model.py # Main MVHST model
│ └── transformer_encoder.py
├── features/ # Feature extractors
│ ├── mel_extractor.py
│ ├── cqt_extractor.py
│ └── harmonic_features.py
├── utils/ # Helper functions
├── config.py # Configuration settings
├── train.py # Training script
└── checkpoints/ # Trained model weights
Model Performance
- Architecture: Multi-View Transformer with 3 feature views
- Parameters: ~2.5M trainable parameters
- Input: 5-second audio clips at 22050 Hz
- Output: 10-class probability distribution
- Accuracy: Trained on custom drone dataset
Key Innovations
- Multi-View Approach: First to combine Mel, CQT, and Harmonic features for drone classification
- View Attention: Intelligent fusion mechanism that adapts to different audio characteristics
- Transformer Architecture: Leverages state-of-the-art attention mechanisms for temporal pattern recognition
- End-to-End Learning: Entire pipeline trained jointly for optimal performance
Future Enhancements
- Real-time streaming audio analysis
- Support for more drone types
- Improved noise robustness
- Mobile app integration
- Cloud deployment options
Technical Highlights
- Sample Rate: 22050 Hz (captures frequencies up to 11 kHz)
- Duration: 5 seconds per analysis
- Feature Dimensions:
- Mel: 128 frequency bins
- CQT: 84 bins
- Harmonic: 5 bins
- Model Dimension: 256
- Attention Heads: 8 per transformer layer
- Layers: 6 transformer layers per view
Why This Matters
Drones are increasingly common in airspace, and identifying them quickly and accurately is crucial for:
- Security: Distinguishing between different drone types
- Safety: Understanding drone capabilities from audio alone
- Research: Advancing acoustic classification techniques
- Innovation: Pushing boundaries of audio-based AI
Contact & Contribution
This project demonstrates advanced techniques in:
- Audio signal processing
- Deep learning for classification
- Multi-view feature fusion
- Transformer architectures
- Real-time web applications
Summary
MVHST is a cutting-edge AI system that uses advanced deep learning to classify drones from audio recordings. By combining multiple acoustic perspectives with transformer-based neural networks, it achieves accurate classification across 10 drone types with real-time performance and intuitive visualization.
Tagline: "AI-Powered Drone Detection Through Acoustic Analysis"
Log in or sign up for Devpost to join the conversation.