Inspiration
Get ready to transcend the boundaries of traditional music composition and step into the extraordinary world of Spatial MusicGen with MultiBand Diffusion using AI. This is your chance to create breathtaking soundscapes that will captivate the senses and redefine the future of how you listen to and interact with music. 🎶 What is Spatial Generation with MultiBand Diffusion? 🎶 Spatial Generation is a groundbreaking approach that leverages cutting-edge AI technologies to compose music that transcends the constraints of stereo sound. With MultiBand Diffusion, you'll have the power to manipulate various frequency bands independently, adding a whole new dimension of richness and depth to your generative musical creations. Imagine music that envelopes you as the listener, taking you on an immersive journey through a multi-dimensional auditory universe wherever you go.
Want to learn more? Try it today for free: https://spatialgeneration-app.vercel.app/
What it does
The app automatically generates music based on the description interpreted by your environment.
How we built it
Next.js and MusicGen Model with MultiBand Diffusion
We were able to avoid create any of the music prompts ourselves but generate them purely by the description of scenery (by inputting the location of the place and inputting an image of the place into BLIP-2). We then could put those prompts into our hugging face space https://huggingface.co/spaces/spatialgeneration/musicgen-mbd to generate the music we created for specific soundscapes.
Challenges we ran into
Making API calls with Gradio JS Client Library
We could not figure out how to stream music as it's generated
The current waiting times of 30-40seconds makes interactive generation difficult
Landscapes influence the music you want to listen to. However, your current mood and whether you are in nature or in an urban scenario a way bigger effect than the exact surroundings you are in
You can ask specific questions of images (BLIP-2) that can be used as input for the prompt generation of the current environment for a user. We did lots of experiments generating music by looking at the your current surroundings. Ultimately, weather and mood and scenery had the biggest influence. Even if you generate spatial audio, the most important input is still the music style and musical history that the user likes to listen to, for personalized audio access to their musical history is crucial
Built With
- colabpro+
- gradio
- huggingface
- musicgen
- next.js

Log in or sign up for Devpost to join the conversation.