Inspiration

Torgal is an intelligent environment that adapts to the speaker’s intent, replacing manual slides with responsive automation. Even if you do remember, it's manual, unneeded work. So what if it was automated for you? What if there was a tool that could read your intent on what you're talking about, and act upon it?

This is Torgal. The presentation interface that will follow your lead: quickly and naturally.

What it does

Torgal is smart. Just upload your presentation, and you're good to go. Torgal will internally use text embedding models to parse your slides, completely on-device.

As soon as you start talking, Torgal will analyze your speech live and deduce whether you're heading to the next slide, staying put, or even going back. A presenter view dashboard shows what's going on under the hood, showing its decision making process with confidence levels and transcripts.

Done with your presentation? Go and toggle Q&A mode. Torgal will adjust its heuristics and jump to any slide you're talking about, giving your audience immediate context on the question, even before you answer them!

How we built it

The entire interface is built on Electron. For our text embeddings, we used Python and Pymupdf. Finally, we used a text AI model of Whisper.

Challenges we ran into

Some notable challenges we ran into:

  • Detecting intent: How were we able to move our slides with our words without being explicit
  • Optimizing speed of processing: How can we maximize the performance and lower the latency
  • Product Design: How can we create an analytical view that is meaningful and effective?
  • Audio input: How do we somehow turn Whisper's batch audio processing to a fast, accurate, streaming processing instead? These problems were the biggest challenges to tackle throughout the night.

Accomplishments that we're proud of

We are genuinely proud for Torgal to be created and optimized within the time limit of this hackathon. This is new technologies for us, so it's so cool to think about how a user's speech is converted to a thousand numbers, then calculated back to one, all to just signify the confidence of wanting to move to the next slide.

What we learned

Audio interpretation on a pretrained model has its own uses and limits. There's always going to be inaccuracies especially with less-than-stellar audio quality on transcription. But we've learned a lot in terms of figuring out what really goes into these text embeddings, how to optimize things to make it faster and yet somehow more accurate. It's truly an art.

What's next for Torgal

EVEN MORE OPTIMIZATIONS: Torgal may be a fine helper, but we want him to be even more agile (smaller and compact in terms of size). Hopefully with the help of user feedback, we can provide tweaks to all the analyzing parameters that we use to make Torgal even better!

Several features were scrapped due to time constraints and focus that we would like to continue to work on, such as on-slide text highlighting and zooming of sections.

Built With

Share this project:

Updates