SynthCV: The Paper-Based AR Synthesizer

Demonstration: https://drive.google.com/file/d/1XUXS3ac0JPUvQXIgdA2Zq3X1jpgBxB75/view

Inspiration and Impact

As a team of musicians, producers, and engineers, we believe that learning music is an essential skill for expressing creativity, yet the cost of instruments often bars people from exploring this passion. Hardware synthesizers, MIDI controllers, and pianos are often expensive, bulky, and intimidating for beginners. We wanted to lower that barrier by turning everyday objects into musical tools. Our vision was to build a system that allows anyone with a webcam and a marker to instantly create a playable interface, democratizing the ability to create music and making the creative process accessible to everyone regardless of their financial situation.

Challenges

Our journey to detecting the "keys" on the paper was iterative and filled with trial and error. We initially experimented with standard blob detection and the Hough Circle Transform to locate the drawn dots, but we struggled significantly with noise; stray marks on the table and surrounding environment were constantly misidentified as keys. Through testing, we found that using contour detection combined with a spatial filter was the superior approach. By only accepting blobs that fell strictly within the detected boundary of the piano paper, we were able to filter out background noise effectively. This proved to be the most robust method, maintaining accuracy across different lighting conditions and messy table backgrounds.

Refining

Once we could see the keys, our next challenge was ensuring they didn't trigger accidentally. We decided to utilize MediaPipe for precise hand pose estimation, shifting our logic to trigger a note only when the tip of a finger made direct contact with a dot. We implemented this by tracking the Euclidean (L2) distance between the fingertip coordinates and the dot's center, ensuring a deliberate "press" rather than a stray hover. We also ran into a logic issue where the notes were assigned randomly rather than in a musical scale. We solved this by implementing a sorting algorithm that filtered and arranged the detected dots based on their relative positions, ensuring the keys were assigned in the correct chromatic order. Finally, we wanted the audio experience to match the visual technology. We initially used Pygame to generate simple frequencies, but we felt limited by the beep-like quality of the sound. To create a richer musical experience, we pivoted to using pyfluidsynth. This allowed us to import SoundFont (.sf2) files, the standard for digital synthesizers. This switch not only drastically improved the audio quality but also gave users the flexibility to load whatever instrument sounds they wanted, fully realizing our goal of a customizable, virtual instrument.

Built With

Share this project:

Updates