A model, training/inference scripts, and a library for detecting Suno ≤ 5 and Udio ≤ 1.5 generated music.
This model detects AI-generated music by exploiting spectral artifacts inherent to neural vocoders. Most audio generators rely on deconvolution layers to upsample latent representations back to audio sample rates. These layers leave predictable fingerprints in the frequency domain.
A deconvolution (transposed convolution) with stride
- Zero-upsampling: Insert
$k-1$ zeros between each sample - Convolution: Apply the learned kernel
The zero-upsampling is equivalent to oversampling a discrete signal. For a signal
where
This means the spectrum of
For a single deconvolution with stride
For
These artifacts depend only on the stride configuration, not on training data or learned weights. This has two implications:
- High accuracy on known architectures: The spectral fingerprint is deterministic and consistent across all outputs from a given generator
- Requires retraining for new architectures: Different vocoder designs produce different peak patterns, so the model must be updated when generators change their architecture
We extract a fakeprint by computing the average spectrum, subtracting its lower envelope (to isolate peaks from melodic content), and analyzing the 1-8 kHz band where artifacts are most prominent.
using AiMusicDetector;
// Load the detector with your trained model
using var detector = MusicDetector.Load("ai_music_detector.onnx");
// Analyze an audio file
var result = detector.Analyze("song.mp3");
Console.WriteLine($"AI Probability: {result.AiProbability:P1}");
Console.WriteLine($"Classification: {result.Classification}");
Console.WriteLine($"Confidence: {result.Confidence:P0}");# NuGet (once published)
dotnet add package AiMusicDetector
# Or add project reference
dotnet add reference src/AiMusicDetector/AiMusicDetector.csprojFor the closest match to Python/torchaudio/FFmpeg decoding, the C# pipeline can decode MP3 using in-process FFmpeg (no ffmpeg.exe) via FFmpeg.AutoGen.
-
Override library path:
- Set
AIMUSICDETECTOR_FFMPEG_LIBSto the folder containing FFmpeg shared libraries (avcodec,avformat,swresample, etc.)
- Set
-
Bundled binaries layout (recommended):
FFmpeg/bin/win-x64/*.dllFFmpeg/bin/win-arm64/*.dllFFmpeg/bin/linux-x64/*.soFFmpeg/bin/linux-arm64/*.soFFmpeg/bin/osx-x64/*.dylibFFmpeg/bin/osx-arm64/*.dylib
The loader also supports the legacy AutoGen example layout on Windows:
FFmpeg/bin/x64/*.dll
If no in-process FFmpeg libraries are found, the code will fall back to the other decoding path(s).
- Miniconda or Anaconda
- CUDA-capable GPU (optional, speeds up feature extraction)
- ~200 GB disk space for datasets
cd python
# Windows
setup_env.bat
# Linux/macOS
chmod +x setup_env.sh
./setup_env.sh
# Or manually:
conda env create -f environment.yml
conda activate ai-music-detector# Activate environment
conda activate ai-music-detector
# Download FMA (real music) and SONICS (AI-generated)
python download_data.py --dataset allDatasets:
- FMA Medium - 25,000 real music tracks (22 GB)
- SONICS - 49,000+ AI-generated songs from Suno/Udio (~150 GB)
# Extract features from real music
python extract_fakeprints.py \
--input ./data/fma/fma_medium \
--output ./output/fma_fakeprints.npy \
--label real
# Extract features from AI-generated music
python extract_fakeprints.py \
--input ./data/sonics/fake_songs \
--output ./output/sonics_fakeprints.npy \
--label fakepython train_model.py \
--real ./output/fma_fakeprints.npy \
--fake ./output/sonics_fakeprints.npypython export_onnx.py --model ./modelsThe trained model will be saved to ./models/ai_music_detector.onnx.
For better robustness against pitch shifts, EQ changes, mastering, and transcoding, train the CNN-based model:
# Train CNN with on-the-fly augmentations
python train_cnn.py \
--real ./data/fma/fma_medium \
--fake ./data/sonics/fake_songs
# Export to ONNX
python export_onnx_cnn.py
# Inference
python inference_cnn.py --model ./models/cnn_detector.onnxThe CNN model uses CQT spectrograms (log-frequency) which provide pitch-shift invariance, and trains with data augmentation including pitch shifting, EQ, compression, and noise injection.
// Load model
using var detector = MusicDetector.Load("model.onnx");
using var detector = MusicDetector.Load(modelBytes);
// Analyze files
DetectionResult result = detector.Analyze("song.mp3");
float probability = detector.Predict("song.mp3");
bool isAi = detector.IsAiGenerated("song.mp3");
// Analyze samples directly
float[] samples = LoadYourAudio();
var result = detector.Analyze(samples, sampleRate: 44100);
// Batch processing
var results = detector.AnalyzeBatch(new[] { "song1.mp3", "song2.mp3" });public class DetectionResult
{
float AiProbability; // 0.0 (Real) to 1.0 (AI)
bool IsAiGenerated; // true if probability > threshold
float Confidence; // 0.0 (uncertain) to 1.0 (certain)
string Classification; // "Real Music" or "AI-Generated"
double AudioDurationSeconds;
long ProcessingTimeMs;
}var options = new MusicDetectorOptions
{
SampleRate = 44100,
MaxDurationSeconds = 180,
Threshold = 0.5f,
UseGpu = false
};
using var detector = MusicDetector.Load("model.onnx", options);# Build
dotnet build src/AiMusicDetector.Console
# Run
dotnet run --project src/AiMusicDetector.Console -- model.onnx song1.mp3 song2.mp3Evaluated on a held-out test set of 17,866 samples (5,741 real, 12,125 AI-generated).
| Metric | Value |
|---|---|
| Accuracy | 99.88% |
| Precision | 0.9985 |
| Recall | 0.9998 |
| F1 Score | 0.9991 |
| False Positive Rate | 0.31% |
| False Negative Rate | 0.02% |
- Sample Rate Dependent: Audio must be resampled to 16000 Hz
- Minimum Duration: Works best with 10+ seconds of audio
- Evolving Generators: Needs retraining on new generations of AI music generators
This implementation is based on the fakeprint detection method proposed by Afchar et al. [1], which demonstrates that neural vocoders in generative music models produce characteristic frequency-domain artifacts due to their deconvolution architecture.
[1] D. Afchar, G. Meseguer-Brocal, K. Akesbi, and R. Hennequin, "A Fourier Explanation of AI-music Artifacts," in Proc. International Society for Music Information Retrieval Conference (ISMIR), 2025. Available: https://arxiv.org/abs/2506.19108
MIT License