Inspiration
I am a very lazy reader and tend to end up skipping important dialogues due to being lazy and skimming reading panels while not having any OST (Original Soundtrack) in the back ground. Therefore, not keeping me engaged
What it does
On our UI we load in a manga panel and then read the text on this file and then output the voice of the person talking in the correct order.
How we built it
It takes in a manga panel PNG and uses Gemini AI to learn how to read a manga panel (right to left) and the stores these texts in a dialogue 2D array with each element storing an name of the talker and then their dialogue. It then find which dialogue is associated with that character and use the cloned voiced of the characters I made by getting MP3 samples of there voice from the Anime to replicate there voices.
Challenges we ran into
The challenge I found most difficult was accurately sampling the voices of character from the manga as we were only given 10mb of audio file to be used to replicate the voices which was not sufficient amount of data to get a accurate copy. Another issue we faced was reading the text on the manga panel as we had to process the image many times for the computer to recognise the text. However, this was very slow and also did not give an accurate representation of the text on the panel. I overcame this challenge by using Gemini AI technology to scan the text for me making it much quicker and accurate.
Accomplishments that we're proud of
The panel was successfully read with a sufficient amount of accuracy
What we learned
I learned how to use an AI model to get text to speech and utilise these features to create a unique application.
What's next for Neuphonic
Use larger audio files to get an accurate voice representation.
Log in or sign up for Devpost to join the conversation.