Inspiration

We wanted a small, practical toolkit that turns music files and links into structured musical insight — something that helps people quickly understand tempo, key, instrumentation, and structure and gives a visual preview to aid editing or scoring. Combining lightweight feature extraction with an LLM for interpretation made a compelling, pragmatic trade-off between accuracy and engineering cost.

What it does

  • Accepts uploads (mp3, wav, midi, MusicXML) or links (YouTube, Spotify preview workflows).
  • Extracts metadata and basic audio features server-side (Python extractor if available, Node fallback using music-metadata).
  • Sends structured features to a Gemini-backed analysis service that returns an AnalysisResult JSON used by the UI.
  • Shows an interactive waveform for uploaded files and an embedded YouTube player for YouTube links.
  • Provides a two-column UI: input + preview on the left, analysis output on the right.

How we built it

  • Next.js (App Router) for pages and API routes.
  • Server-side analyze endpoint(s): /api/analyze (link) and /api/analyze-file (file).
  • Feature extraction:
    • Primary: Python script (librosa) when Python is available.
    • Fallback: Node extractor (dynamic import of music-metadata) for duration/bitrate/title/artist and simple MusicXML/MIDI heuristics.
  • Gemini integration wrapped in a server-side service module that lazy-loads the SDK and falls back to a mock when no API key is present.
  • Client: React components for Header, InputArea, VisualPreview, AnalysisDisplay; WaveSurfer loaded dynamically for waveform visualization.
  • CI-style safeguards: type checks, ESLint fixes, and runtime guards to keep server-only code out of client bundles.

Challenges we ran into

  • Environment separation: server-only env helpers (zod/env) caused bundling/prerender errors when imported from client pages; fixed by reading NEXT_PUBLIC_* variables in client code.
  • Missing Python on runtime produced spawn ENOENT errors for the extractor; implemented a Node fallback and clear server messages.
  • Upload size limits and platform constraints (serverless body limits); added server-side MAX_UPLOAD_BYTES guard and guidance for larger uploads or streaming to object storage.
  • LLM hallucinations and inconsistent outputs when prompts were too vague; required prompt tightening, metadata enrichment (oEmbed for links), and structured JSON instructions.
  • Type/lint noise from dynamic imports and third-party libs; resolved via unknown narrowing, runtime checks, and minimal type assertions.

Accomplishments that we're proud of

  • Functional end-to-end flow for local uploads and link analysis with graceful fallbacks.
  • Componentized UI to match design across pages (SoundAnalyzer / SheetSketcher) and consistent responsive layout.
  • Robust server route for file uploads with Python + Node extractor fallback and clear error handling.
  • Improved prompts and metadata enrichment to reduce hallucination surface.
  • Cleaned up ESLint and TypeScript issues across the project and updated README + tagline.

What we learned

  • Keep server-only code out of client bundles — NEXT_PUBLIC_* variables are the correct way to expose environment config to the browser.
  • Real audio feature extraction (tempo/key/instruments) reliably requires native DSP tooling (librosa or dedicated models); a Node fallback can provide useful metadata but not deep musical features.
  • LLMs need structured inputs and example JSON to be consistent. Enriching prompts with link metadata (oEmbed) or extracted features dramatically improves reliability.
  • Serverless platforms impose upload/body limits; large-file workflows require streaming or external storage.

What's next for SoundSketcher

  • Add implementations for multi-instrumental breakdowns for music scoring.
  • Improve extraction accuracy:
    • Ship a small worker or container with Python/librosa available (or a dedicated feature-extraction service).
    • Add optional richer MIDI/MusicXML parsing.
  • Production-ready uploads:
    • Presigned uploads to object storage (S3/GCS), background analysis jobs, and status polling.
  • Prompt & model improvements:
    • Lock the model settings, provide stricter schema examples, and add validation of model output.
  • UX polish:
    • Show extractor status/errors explicitly in the UI.
    • Add client-side trimming option (first 30s) to keep uploads predictable.
  • Tests & CI:
    • Add unit and integration tests for API routes and main flows to avoid regressions.
  • Deploy:
    • Provide a deployment recipe with required runtimes (Python) or run the extractor as a separate service to avoid platform limitations.
Share this project:

Updates