Inspiration
We wanted a small, practical toolkit that turns music files and links into structured musical insight — something that helps people quickly understand tempo, key, instrumentation, and structure and gives a visual preview to aid editing or scoring. Combining lightweight feature extraction with an LLM for interpretation made a compelling, pragmatic trade-off between accuracy and engineering cost.
What it does
- Accepts uploads (mp3, wav, midi, MusicXML) or links (YouTube, Spotify preview workflows).
- Extracts metadata and basic audio features server-side (Python extractor if available, Node fallback using music-metadata).
- Sends structured features to a Gemini-backed analysis service that returns an AnalysisResult JSON used by the UI.
- Shows an interactive waveform for uploaded files and an embedded YouTube player for YouTube links.
- Provides a two-column UI: input + preview on the left, analysis output on the right.
How we built it
- Next.js (App Router) for pages and API routes.
- Server-side analyze endpoint(s):
/api/analyze(link) and/api/analyze-file(file). - Feature extraction:
- Primary: Python script (librosa) when Python is available.
- Fallback: Node extractor (dynamic import of
music-metadata) for duration/bitrate/title/artist and simple MusicXML/MIDI heuristics.
- Gemini integration wrapped in a server-side service module that lazy-loads the SDK and falls back to a mock when no API key is present.
- Client: React components for Header, InputArea, VisualPreview, AnalysisDisplay; WaveSurfer loaded dynamically for waveform visualization.
- CI-style safeguards: type checks, ESLint fixes, and runtime guards to keep server-only code out of client bundles.
Challenges we ran into
- Environment separation: server-only env helpers (zod/env) caused bundling/prerender errors when imported from client pages; fixed by reading
NEXT_PUBLIC_*variables in client code. - Missing Python on runtime produced
spawn ENOENTerrors for the extractor; implemented a Node fallback and clear server messages. - Upload size limits and platform constraints (serverless body limits); added server-side MAX_UPLOAD_BYTES guard and guidance for larger uploads or streaming to object storage.
- LLM hallucinations and inconsistent outputs when prompts were too vague; required prompt tightening, metadata enrichment (oEmbed for links), and structured JSON instructions.
- Type/lint noise from dynamic imports and third-party libs; resolved via
unknownnarrowing, runtime checks, and minimal type assertions.
Accomplishments that we're proud of
- Functional end-to-end flow for local uploads and link analysis with graceful fallbacks.
- Componentized UI to match design across pages (SoundAnalyzer / SheetSketcher) and consistent responsive layout.
- Robust server route for file uploads with Python + Node extractor fallback and clear error handling.
- Improved prompts and metadata enrichment to reduce hallucination surface.
- Cleaned up ESLint and TypeScript issues across the project and updated README + tagline.
What we learned
- Keep server-only code out of client bundles —
NEXT_PUBLIC_*variables are the correct way to expose environment config to the browser. - Real audio feature extraction (tempo/key/instruments) reliably requires native DSP tooling (librosa or dedicated models); a Node fallback can provide useful metadata but not deep musical features.
- LLMs need structured inputs and example JSON to be consistent. Enriching prompts with link metadata (oEmbed) or extracted features dramatically improves reliability.
- Serverless platforms impose upload/body limits; large-file workflows require streaming or external storage.
What's next for SoundSketcher
- Add implementations for multi-instrumental breakdowns for music scoring.
- Improve extraction accuracy:
- Ship a small worker or container with Python/librosa available (or a dedicated feature-extraction service).
- Add optional richer MIDI/MusicXML parsing.
- Production-ready uploads:
- Presigned uploads to object storage (S3/GCS), background analysis jobs, and status polling.
- Prompt & model improvements:
- Lock the model settings, provide stricter schema examples, and add validation of model output.
- UX polish:
- Show extractor status/errors explicitly in the UI.
- Add client-side trimming option (first 30s) to keep uploads predictable.
- Tests & CI:
- Add unit and integration tests for API routes and main flows to avoid regressions.
- Deploy:
- Provide a deployment recipe with required runtimes (Python) or run the extractor as a separate service to avoid platform limitations.

Log in or sign up for Devpost to join the conversation.