Inspiration

In Hackprinceton 2018, Christian worked with a deaf/mute student to create an AR application to caption sign language. They failed. However, his new friend's story was inspiring. During their pitch, he said that growing up deaf was lonely because having to sign to talk was isolating. If applications like what we had been planning to make existed it would have helped many other hearing impaired people like him.

For the first time in history, a group of undergraduates who learned from free intro to ML videos could create technology advanced enough to challenge impairments and impact lives meaningfully.

So we built a first iteration as a proof of concept for a robust generalized sign language model through keypoint estimation. To make this model useful in the short-term, we'll be tackling the accessibility problem with smart homes by synthesizing speech to issue voice commands.

What it does

  1. It detects your body's keypoints (finds where your head, hands, fingers etc... in 2d space)
  2. We feed these keypoints into our deep learning model to see what gesture you're most likely trying to do
  3. We then figure out what command your gesture maps to, and use google cloud text to speech to send commands to any smart home device

How we built it

We used openpose to estimate keypoints and pytorch to build the deep learning model. We then used Google speech to text to synthesize a voice command to Alexa

Challenges we ran into

  • Hardware challenges
  • Data, data, data
  • Time

Accomplishments that we're proud of

Robust model that works on any type of camera (web cam, phone cam etc...) on any type of background, and do not require any special equipment.

Most modern deep learning approaches require expensive equipment or rely on rigid constraints such as solid colored backgrounds. This works in the wild!

What we learned

Babysitting the training process, pytorch, deep learning concepts, teamwork

What's next for SignToSpeech

This project shows that with good data a generalized sign/gesture to text model could work. This opens up a whole new world of innovation for HCI (more gesture based UI), AR, and of course a potential novel solution to the sign language computer vision problem.

Built With

Share this project:

Updates