About This Project

Project Overview

MVHST (Multi-View Harmonic Spectrum Transformer) is an advanced AI-powered drone classification system that identifies and categorizes drones by analyzing their unique acoustic signatures. The system can distinguish between 10 different drone types (A-J) from audio recordings alone.


What It Does

  • Upload audio files (WAV, MP3, M4A, etc.)
  • Analyze acoustic features using multi-view deep learning
  • Classify drones into 10 categories (A-J)
  • Display probability scores for all drone types in an interactive radar chart
  • Provide real-time predictions with confidence scores

Key Features

🎯 Multi-View Analysis

  • Extracts three complementary views: Mel Spectrogram, CQT, and Harmonic Features
  • Each view captures different aspects of the acoustic signature
  • Intelligent fusion combines insights from all views

🤖 Deep Learning Architecture

  • Transformer-based neural network
  • 6 layers with 8 attention heads per view
  • Learns complex temporal and frequency patterns

📊 Interactive Visualization

  • Real-time radar chart showing probabilities for all 10 drone types
  • Visual feedback on prediction confidence
  • Audio waveform and spectrogram displays

Real-Time Processing

  • Fast inference on uploaded audio files
  • Automatic audio preprocessing and standardization
  • Instant classification results

Technology Stack

Backend

  • Python with Flask framework
  • PyTorch for deep learning
  • Librosa & TorchAudio for audio processing
  • Transformers architecture for pattern recognition

Frontend

  • React with Vite
  • Canvas API for visualizations
  • Modern UI/UX with responsive design

Model Architecture

  • MVHST Model: Custom Multi-View Transformer
  • Feature Extractors: Mel, CQT, Harmonic
  • Fusion Layer: Attention-based view combination
  • Classifier: Neural network with 10 output classes

How It Works

  1. Audio Input: User uploads audio file through web interface
  2. Preprocessing: Audio standardized to 22050 Hz, mono, 5 seconds
  3. Feature Extraction: Three views extracted simultaneously
  4. Deep Processing: Each view processed through transformer encoder
  5. Intelligent Fusion: Views combined using attention mechanism
  6. Classification: Neural network outputs probabilities for all 10 classes
  7. Visualization: Results displayed in interactive radar chart

Use Cases

  • Security & Surveillance: Identify drone types from audio recordings
  • Research & Analysis: Classify drone acoustic signatures
  • Airspace Monitoring: Automated drone detection systems
  • Educational: Learn about audio classification and deep learning

Project Structure

MV-HST/
├── backend.py              # Flask server with MVHST model
├── models/                 # Neural network architectures
│   ├── fusion_model.py     # Main MVHST model
│   └── transformer_encoder.py
├── features/               # Feature extractors
│   ├── mel_extractor.py
│   ├── cqt_extractor.py
│   └── harmonic_features.py
├── utils/                  # Helper functions
├── config.py              # Configuration settings
├── train.py               # Training script
└── checkpoints/           # Trained model weights

Model Performance

  • Architecture: Multi-View Transformer with 3 feature views
  • Parameters: ~2.5M trainable parameters
  • Input: 5-second audio clips at 22050 Hz
  • Output: 10-class probability distribution
  • Accuracy: Trained on custom drone dataset

Key Innovations

  1. Multi-View Approach: First to combine Mel, CQT, and Harmonic features for drone classification
  2. View Attention: Intelligent fusion mechanism that adapts to different audio characteristics
  3. Transformer Architecture: Leverages state-of-the-art attention mechanisms for temporal pattern recognition
  4. End-to-End Learning: Entire pipeline trained jointly for optimal performance

Future Enhancements

  • Real-time streaming audio analysis
  • Support for more drone types
  • Improved noise robustness
  • Mobile app integration
  • Cloud deployment options

Technical Highlights

  • Sample Rate: 22050 Hz (captures frequencies up to 11 kHz)
  • Duration: 5 seconds per analysis
  • Feature Dimensions:
    • Mel: 128 frequency bins
    • CQT: 84 bins
    • Harmonic: 5 bins
  • Model Dimension: 256
  • Attention Heads: 8 per transformer layer
  • Layers: 6 transformer layers per view

Why This Matters

Drones are increasingly common in airspace, and identifying them quickly and accurately is crucial for:

  • Security: Distinguishing between different drone types
  • Safety: Understanding drone capabilities from audio alone
  • Research: Advancing acoustic classification techniques
  • Innovation: Pushing boundaries of audio-based AI

Contact & Contribution

This project demonstrates advanced techniques in:

  • Audio signal processing
  • Deep learning for classification
  • Multi-view feature fusion
  • Transformer architectures
  • Real-time web applications

Summary

MVHST is a cutting-edge AI system that uses advanced deep learning to classify drones from audio recordings. By combining multiple acoustic perspectives with transformer-based neural networks, it achieves accurate classification across 10 drone types with real-time performance and intuitive visualization.

Tagline: "AI-Powered Drone Detection Through Acoustic Analysis"

Built With

Share this project:

Updates