Automatically identify tracklists from SoundCloud DJ mixes using audio fingerprinting. This is a hobby project, it is not in active development, so best to not take this too seriously.
- π΅ Downloads audio from SoundCloud URLs
- π Identifies tracks using Shazam's audio fingerprinting
- π Generates timestamped tracklists with metadata
- π Web interface for easy interaction
- π Support for proxy rotation to avoid rate limits
- πΎ Saves results locally for quick re-access
- π Browse and reload previously processed mixes
- β‘ Flexible processing stages (skip download or recognition steps)
- π― Dynamic confidence scoring with customizable thresholds
- yt-dlp - Audio downloading
- ShazamIO - Track identification
- Streamlit - Web interface / frontend
- FFmpeg - Audio segmentation
Before installing, ensure you have the following installed:
- Python 3.12.4 (managed via pyenv)
- pyenv - Python version management (Installation guide)
- pyenv-virtualenv - Python virtual environment plugin (Installation guide)
- Poetry - Python dependency management (Installation guide)
- FFmpeg - Audio processing library
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt-get install ffmpeg
- Clone the repository:
git clone https://github.com/yourusername/trackid.git
cd trackid- Install the development environment:
make install_devThis command will:
- Install Python 3.12.4 via pyenv
- Create a virtual environment named
trackid-3.12.4 - Install all dependencies via Poetry
- Set up the Jupyter kernel for notebooks
Configure the application's default settings by editing trackid/settings.py:
SEGMENT_LENGTH = 30 # Duration of each audio segment in seconds
MAX_AUDIO_HOURS = 4 # Maximum audio length to process (in hours)
MAX_AUDIO_LENGTH = 60 * 60 * MAX_AUDIO_HOURS # Calculated time limit in seconds- SEGMENT_LENGTH: The mix is split into segments of this duration for recognition. Shorter segments = more API calls but better accuracy for transitions.
- MAX_AUDIO_HOURS: Limits processing time to avoid excessive API usage on very long mixes.
SEMAPHORE_LIMIT = 100 # Number of concurrent API requests
BATCH_SIZE = 100 # Number of segments per batch
REQUEST_BATCH_DELAY = 3 # Delay (seconds) between batches- SEMAPHORE_LIMIT: Controls concurrent requests to Shazam API. Higher = faster but may hit rate limits.
- BATCH_SIZE: Process segments in batches to manage memory and track progress.
- REQUEST_BATCH_DELAY: Wait time between batches to avoid rate limiting.
USE_PROXY = True # Enable/disable proxy usageUsing Proxies (Recommended for Heavy Usage)
To avoid rate limiting when processing many mixes, we recommend using rotating proxies. For example, Webshare.io:
- Sign up at Webshare.io
- Get your proxy credentials from the dashboard
- Set environment variables:
export PROXY_USER_NAME="your-username-rotate"
export PROXY_PASSWORD="your-password"
export PROXY_HOST="p.webshare.io"
export PROXY_PORT="80"Or add them to your .env file:
PROXY_USER_NAME=your-username-rotate
PROXY_PASSWORD=your-password
PROXY_HOST=p.webshare.io
PROXY_PORT=80The rotating proxy will automatically change IPs for each request, helping avoid detection and rate limits.
PROCESSING_STAGE = "full" # Options: "full", "recognition", "postprocessing"- full: Complete pipeline (download β segment β recognize β postprocess)
- recognition: Skip download if audio exists (segment β recognize β postprocess)
- postprocessing: Only reprocess existing raw results with new settings
MIN_DETECTIONS_FOR_PROBABLE = 2 # Minimum detections to mark track as "probable"
MAX_TIME_GAP_FOR_SAME_TRACK = 600 # Max seconds between detections for same trackTracks detected multiple times across the mix are merged based on these settings. Adjusting thresholds helps filter out false positives.
Optional Discogs integration for enhanced metadata:
export DISCOGS_TOKEN="your-discogs-api-token"
export DISCOGS_USER_AGENT="TrackID/1.0"Process a mix directly:
# Edit trackid/main.py to set your SoundCloud URL
poetry run python -m trackid.mainLaunch the Streamlit app:
make serve_localThen open your browser to http://localhost:8501.
Features:
- π Browse Saved Mixes: View and reload previously processed mixes
- π΅ Process New Mixes: Enter SoundCloud URL and click "Process Mix"
- βοΈ Configurable Settings: Adjust all processing parameters via sidebar
- π― Confidence Filtering: Toggle probable/uncertain tracks with dynamic thresholds
- β° Time Navigation: Click timestamps to jump to specific tracks in player
- π Reprocess: Re-run with different settings without re-downloading
- π¬ YouTube Preview: Search and play tracks directly in the interface
Processed results are stored in data/outputs/track_list/:
data/outputs/track_list/
βββ Mix Name/
βββ tracklist_raw.json # Raw Shazam API responses
βββ tracklist_processed.json # Cleaned, merged, and sorted tracklist
Each tracklist includes:
- Track title and artist
- Album and release year
- Start and end timestamps
- Shazam URL
- Cover artwork URL
- Streaming links (Spotify, Apple Music, etc.)
- Discogs search URL (if configured)
trackid/
βββ trackid/ # Core package
β βββ main.py # Entry point
β βββ core.py # Main processing logic
β βββ downloader.py # Audio downloading
β βββ splitter.py # Audio segmentation
β βββ recognizer.py # Track recognition
β βββ postprocess.py # Tracklist processing
β βββ metadata_manager.py # Mix metadata CRUD
β βββ schemas.py # Data models
β βββ settings.py # Configuration
βββ frontend/ # Streamlit web interface
β βββ streamlit_app.py # Main UI
β βββ ui_components.py # Reusable UI components
β βββ settings.py # Frontend config
βββ data/
β βββ inputs/ # Downloaded audio
β βββ outputs/ # Generated tracklists & metadata
βββ Makefile # Development commands
URL Input β Download Audio β Segment β Recognize β Postprocess β Save
β β β β
(cached) (temp files) (Shazam) (merge/enrich)
Modular Processing Stages: Allows reprocessing with new settings without re-downloading or re-recognizing tracks.
Metadata Management: All processed mixes tracked in data/outputs/metadata.json for browsing and reloading.
Dynamic Confidence Scoring: Track confidence recalculated in real-time based on UI settings without reprocessing.
Memory Efficiency: Streamlit app uses caching (@st.cache_resource) and lazy loading to minimize memory footprint.
Error Resilience: Comprehensive error handling with backup/restore for metadata writes.
make serve_local # Run Streamlit app
make run # Run main script
make install_dev # Set up development environment
make fix_all # Auto-fix linting issues
make check_ruff # Check code with Ruff linter
make check_black # Check code formatting
make nuke_venv # Delete virtual environmentRate Limiting: Enable proxy usage (USE_PROXY = True) or reduce SEMAPHORE_LIMIT and BATCH_SIZE
Missing Tracks: Try adjusting SEGMENT_LENGTH (shorter = more API calls but better accuracy) or use "recognition" stage to retry with different settings
Audio Errors: MP3 decoder warnings (mpa: invalid main_data_begin) are harmless and automatically suppressed