SoundSketcher

SoundAnalyzer Page
SheetSketcher Page
Home Page
SoundAnalyzer from File
SoundAnalyzer from Link
Exported XML file Example

Inspiration

We wanted a small, practical toolkit that turns music files and links into structured musical insight — something that helps people quickly understand tempo, key, instrumentation, and structure and gives a visual preview to aid editing or scoring. Combining lightweight feature extraction with an LLM for interpretation made a compelling, pragmatic trade-off between accuracy and engineering cost.

What it does

Accepts uploads (mp3, wav, midi, MusicXML) or links (YouTube, Spotify preview workflows).
Extracts metadata and basic audio features server-side (Python extractor if available, Node fallback using music-metadata).
Sends structured features to a Gemini-backed analysis service that returns an AnalysisResult JSON used by the UI.
Shows an interactive waveform for uploaded files and an embedded YouTube player for YouTube links.
Provides a two-column UI: input + preview on the left, analysis output on the right.

How we built it

Next.js (App Router) for pages and API routes.
Server-side analyze endpoint(s): /api/analyze (link) and /api/analyze-file (file).
Feature extraction:
- Primary: Python script (librosa) when Python is available.
- Fallback: Node extractor (dynamic import of music-metadata) for duration/bitrate/title/artist and simple MusicXML/MIDI heuristics.
Gemini integration wrapped in a server-side service module that lazy-loads the SDK and falls back to a mock when no API key is present.
Client: React components for Header, InputArea, VisualPreview, AnalysisDisplay; WaveSurfer loaded dynamically for waveform visualization.
CI-style safeguards: type checks, ESLint fixes, and runtime guards to keep server-only code out of client bundles.

Challenges we ran into

Environment separation: server-only env helpers (zod/env) caused bundling/prerender errors when imported from client pages; fixed by reading NEXT_PUBLIC_* variables in client code.
Missing Python on runtime produced spawn ENOENT errors for the extractor; implemented a Node fallback and clear server messages.
Upload size limits and platform constraints (serverless body limits); added server-side MAX_UPLOAD_BYTES guard and guidance for larger uploads or streaming to object storage.
LLM hallucinations and inconsistent outputs when prompts were too vague; required prompt tightening, metadata enrichment (oEmbed for links), and structured JSON instructions.
Type/lint noise from dynamic imports and third-party libs; resolved via unknown narrowing, runtime checks, and minimal type assertions.

Accomplishments that we're proud of

Functional end-to-end flow for local uploads and link analysis with graceful fallbacks.
Componentized UI to match design across pages (SoundAnalyzer / SheetSketcher) and consistent responsive layout.
Robust server route for file uploads with Python + Node extractor fallback and clear error handling.
Improved prompts and metadata enrichment to reduce hallucination surface.
Cleaned up ESLint and TypeScript issues across the project and updated README + tagline.

What we learned

Keep server-only code out of client bundles — NEXT_PUBLIC_* variables are the correct way to expose environment config to the browser.
Real audio feature extraction (tempo/key/instruments) reliably requires native DSP tooling (librosa or dedicated models); a Node fallback can provide useful metadata but not deep musical features.
LLMs need structured inputs and example JSON to be consistent. Enriching prompts with link metadata (oEmbed) or extracted features dramatically improves reliability.
Serverless platforms impose upload/body limits; large-file workflows require streaming or external storage.

What's next for SoundSketcher

Add implementations for multi-instrumental breakdowns for music scoring.
Improve extraction accuracy:
- Ship a small worker or container with Python/librosa available (or a dedicated feature-extraction service).
- Add optional richer MIDI/MusicXML parsing.
Production-ready uploads:
- Presigned uploads to object storage (S3/GCS), background analysis jobs, and status polling.
Prompt & model improvements:
- Lock the model settings, provide stricter schema examples, and add validation of model output.
UX polish:
- Show extractor status/errors explicitly in the UI.
- Add client-side trimming option (first 30s) to keep uploads predictable.
Tests & CI:
- Add unit and integration tests for API routes and main flows to avoid regressions.
Deploy:
- Provide a deployment recipe with required runtimes (Python) or run the extractor as a separate service to avoid platform limitations.

Built With

css
gemini
geminiapi
git
github
html
javascript
next.js
node.js
npm
python
react
t3
trpc
typescript
vscode

Submitted to

Knight Hacks VIII

Created by

Worked on creating app/website front-end and Sound Analyzer function of website (creating connection to Gemini API and setting up back and forth response handling).

Liam Huang
second-year undergraduate UCF student CS major Business Admin minor
Dylan Gonzalez
Jake Arnold
Oliver Morales

Updates

Liam Huang started this project — Oct 26, 2025 03:50 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.