Inspiration
Imagine bringing your favorite stories to life, not just with narration, but with immersive soundscapes too! Lose yourself in the world of the story with sound effects, emotional music, and professional narration. Perfect for audiobooks, bedtime stories, or simply adding a new dimension to your reading experience. Upload your favorite book, choose from a library of atmospheric sounds, and let BardTales weave its magic.
What it does
BardTales used generative AI convert a written work to an audiobook with professional voice narration and music score.
How we built it
Web based UI is deployed on a flask server, which contains business logic taking the text payload from the input and perform the following operations.
- Polly API converts text to obtain natural sounding speech
- Inference on MusicGen small model.
- Encode speech and Music to a single audio file.
- Playback on web UI.
Challenges we ran into
We wanted to deploy and run inference on EC2 VMs with GPU(s). However, due to the unavailability of GPU quotas for EC2 we resolved to deploy this locally on our machines. Configuring the dependencies for pytorch and cuda was a time consuming process. The MusicGen models being very powerful require very specific context to make aesthetically accurate soundtracks. We experimented with other LLMs GPT, Gemini, Llama 2, to add context prompts to our story text; these experiments demonstrated the potential for such techniques if we were able to obtain full access for the same models in future.
Accomplishments that we're proud of
Aesthetic UI Approach generalizable to other platforms such as e-readers and pdfs. Creative application bringing together distinct generative AI models (music and speech), with an ambitious architecture. Better understanding of applications of prompt engineering for prototyping.
What we learned
AWS EC2, AWS Polly, AWS Rekognition, Flask, Pytorch, CSS, PostgreSQL, Prompt Engineering, Audio processing concepts. We experimented with other LLMs GPT, Gemini, Llama 2, to add context prompts to our story text; these experiments demonstrated the potential for such techniques if we were able to obtain full access for the same models in future.
What's next for BardTales
Bugfixes, scale for bigger payloads, integrate e-readers as a target platform. Merge persistence API hosted on Tembo (PSQL).
Built With
- ai
- amazon-web-services
- deeplearning
- flask
- github
- javascript
- postgresql
- pytorch
- spectre.css


Log in or sign up for Devpost to join the conversation.