TrueCaption

Inspiration

A couple of days ago, one of our team members had a conversation with Kirsten Behling, the Dean of Accessibility at Tufts University. Tufts, like any other higher education institution, is mandated by the ADA and therefore needs to provide accessibility accommodations. She mentioned that Tufts needs to find human captioners because auto-generated captions are terribly inaccurate, and deaf/hard of hearing students can't effectively learn from them.

What it does

Our fine-tuned Automatic Speech Recognition (ASR) model is trained specifically on STEM lectures, delivering precise captions that ensure accessibility for higher education.

How we built it

We fine-tuned an ASR model using numerous organic chemistry video lectures from Khan Academy. We hosted our trained model on a T4 GPU on AWS SageMaker. Interfacing with our model using the Hugging Face Inference API, we created a website that allows users to upload their own video lectures and receive accurate lecture captions overlaid on their inputted videos.

Challenges we ran into

Fine-tuning the model was difficult as acquiring our own dataset proved challenging, and we were very resource-limited for our training process.

Accomplishments that we're proud of

Despite extremely limited resources, we have demonstrated a working model for advanced organic chemistry lectures that is significantly more accurate than previously used auto-captioning systems. With additional resources, the accuracy of our model will only improve.

What we learned

We learned how to fine-tune an ASR model, interface with it using the Hugging Face Inference API, and host it on AWS SageMaker.

What's next for TrueCaption

We hope to add more training data and resources in order to run the project at scale. We also aim to expand our solution to other advanced university subjects.

Built With

huggingface
nextjs
openai
react
sagemaker
typescript
whisper

Submitted to

JumboHack 2025
- Winner Education Track Winner
- Winner Overall Winner

Created by

I worked on the back end API for accessing the ASR model through Hugging Face's serverless inference API as well as procuring the dataset used for fine-tuning our model. I also took part on implementing the frontend.

Johann Zhang
I worked on dataset curation/formatting for fine-tuning the ASR, frontend for visualizing our captions and the performance improvement, and deployment.

Winston Hsiao
I fine-tuned the ASR model, hosted it on AWS SageMaker, and refined the front-end.

Seth M.
I worked on data formatting, processing, frontend development, and ASR model fine-tuning.

arjun Kantamsetty

Updates

Johann Zhang started this project — Feb 23, 2025 02:17 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.