WeTranslate

Inspiration

The idea for WeTranslate came from a common frustration many people experience when watching videos in a language they don’t understand. A friend feedbacked that when watching an English-language video, he found the experience exhausting due to having to read translated captions for over 90 minutes. The constant effort to follow the subtitles distracted her from the content and the subtle nuances of the discussion. This got us thinking: while captions can help, they compromise the viewing experience and prevent full immersion. We realized that language barriers still hinder many people from enjoying media seamlessly. We wanted to fix that.

What It Does

WeTranslate is a Chrome extension that translates the spoken language in YouTube videos directly into your native tongue. With a simple click of the translate button, users can choose their preferred language, and WeTranslate will handle the translation. The tool syncs translated audio to the video, allowing viewers to listen to the content in their chosen language without the distraction of reading captions. Beyond just translating the words, it retains the tone and emotion of the original speakers, enhancing the viewing experience.

How We Built It

The frontend of WeTranslate was built using HTML, CSS, and JavaScript. We integrated YouTube’s transcript API to extract the spoken text from videos and employed Cartesia’s API to generate the translated audio.

Accomplishments That We’re Proud Of

We are proud of creating the frontend and being able to integrated Cartesia's API into our code. We also tried to enable streaming of the audio, though we did not have time to integrate that with the front end and had to take a different approach so that we could finish the product on time

What we learned

Communicating between the frontend and backend is a lot less straightforward than expected. Both sides could work fine, we could be chilling, but we connect the two, and everything suddenly breaks. Working with GitHub is also not easy; most of the time, when trying to share code with our teammates, we would end up just copy-pasting the entire text file through Discord. Probably most importantly of all, though, is that there are many ways to do one thing and neither way is necessarily the best way, and sometimes you have to make a hard choice to just commit to one.

What's next for WeTranslate

At first we wanted to translate to audio in real time through streaming input into Cartesia's WebSocket and stream audio output directly to the backend. This was in order to take care of use cases where the user pauses or skips through the video, but to also take care of syncing the translation with the video in the case of long periods without speaking. As we were approaching the end of the time allotted, we were able to stream text to Cartesia and receive audio back, word by word, but we had serious issues with configuring the audio quality, but more importantly we underestimated the difficulty in interfacing our WebSocket API to our front end.

Thus we pivoted to generating the entire tranlation and play it along side the video