Inspiration

A couple of days ago, one of our team members had a conversation with Kirsten Behling, the Dean of Accessibility at Tufts University. Tufts, like any other higher education institution, is mandated by the ADA and therefore needs to provide accessibility accommodations. She mentioned that Tufts needs to find human captioners because auto-generated captions are terribly inaccurate, and deaf/hard of hearing students can't effectively learn from them.

What it does

Our fine-tuned Automatic Speech Recognition (ASR) model is trained specifically on STEM lectures, delivering precise captions that ensure accessibility for higher education.

How we built it

We fine-tuned an ASR model using numerous organic chemistry video lectures from Khan Academy. We hosted our trained model on a T4 GPU on AWS SageMaker. Interfacing with our model using the Hugging Face Inference API, we created a website that allows users to upload their own video lectures and receive accurate lecture captions overlaid on their inputted videos.

Challenges we ran into

Fine-tuning the model was difficult as acquiring our own dataset proved challenging, and we were very resource-limited for our training process.

Accomplishments that we're proud of

Despite extremely limited resources, we have demonstrated a working model for advanced organic chemistry lectures that is significantly more accurate than previously used auto-captioning systems. With additional resources, the accuracy of our model will only improve.

What we learned

We learned how to fine-tune an ASR model, interface with it using the Hugging Face Inference API, and host it on AWS SageMaker.

What's next for TrueCaption

We hope to add more training data and resources in order to run the project at scale. We also aim to expand our solution to other advanced university subjects.

Built With

Share this project:

Updates