Neuphonic / gdgoncampus

Inspiration

I am a very lazy reader and tend to end up skipping important dialogues due to being lazy and skimming reading panels while not having any OST (Original Soundtrack) in the back ground. Therefore, not keeping me engaged

What it does

On our UI we load in a manga panel and then read the text on this file and then output the voice of the person talking in the correct order.

How we built it

It takes in a manga panel PNG and uses Gemini AI to learn how to read a manga panel (right to left) and the stores these texts in a dialogue 2D array with each element storing an name of the talker and then their dialogue. It then find which dialogue is associated with that character and use the cloned voiced of the characters I made by getting MP3 samples of there voice from the Anime to replicate there voices.

Challenges we ran into

The challenge I found most difficult was accurately sampling the voices of character from the manga as we were only given 10mb of audio file to be used to replicate the voices which was not sufficient amount of data to get a accurate copy. Another issue we faced was reading the text on the manga panel as we had to process the image many times for the computer to recognise the text. However, this was very slow and also did not give an accurate representation of the text on the panel. I overcame this challenge by using Gemini AI technology to scan the text for me making it much quicker and accurate.

Accomplishments that we're proud of

The panel was successfully read with a sufficient amount of accuracy

What we learned

I learned how to use an AI model to get text to speech and utilise these features to create a unique application.

What's next for Neuphonic

Use larger audio files to get an accurate voice representation.

Built With

Updates

leonyang452 Yang started this project — Apr 06, 2025 06:26 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.