Inspiration
American Sign Language is a language that we, American English speakers, should not have a problem communicating with. It is American Sign Language after all. However many of us are unable to communicate with someone who only uses ASL. The communication barrier between English speakers and ASL users should not be as large as it is.
What it does
Listen AT uses a camera to observe someone using ASL, interprets their signing, converts it to text, then converts that text into speech. Thus bridging that communication barrier.
How we built it
Model selection was the first step. I decided to use two different models in this project, one model for hand detection and one for ASL classification. After finding two datasets I needed to train the model. The hand detection was great within its first training, but the ASL classification model needed hours upon hours to train and it still isn't perfect. Once the models were trained, I set up OpenCV to get access to the camera and send the feed over to the hand classification model. This model would identify hands within the camera and send relevant images over to the ASL classification model. The classification model would then classify the images into the different ASL signs it was trained on, selecting the best fit. Once the classifier gave it's prediction it was as simple as using gtts to speak the text.
Challenges we ran into
I had some severe hardware limitations. If I wanted to train full words for ASL the training alone would have taken over 50 hours, longer than the time for the entire hackathon. Some other issues that go hand in hand with machine learning is data availability. The data sets for training the hand detection were about as perfect as one could want, however this was not the case for the ASL dataset. I ended up combining four different ASL datasets and letting the model train for over two hours per training.
Accomplishments that we're proud of
I have a fully functioning project that has reached its objectives for this hackathon. Listen AT can be used by someone who doesn't understand ASL to now understand ASL. It can also be used by someone trying to learn ASL to get feedback on what they are signing and how correctly.
What we learned
I learned the entire process of machine learning on a basic level. Everything from model selection, to training, to connecting two models together.
What's next for Listen AT
I hope to buy some server time and train a model on an entire set of ASL so it is fully capable of understanding all of ASL. A future goal for Listen AT is to work the other way, take in speech or text and display the ASL signing for the words. This would allow for complete communication between an ASL user and an English speaker.
Log in or sign up for Devpost to join the conversation.