BeyondML 2.0

Impact
Proposed model diagram

Inspiration

Amidst Covid 19, there has been a growing recognition that unequal access to services in nations has affected many and prompted us to think about how the less fortunate have been disproportionately affected, especially in times of crisis. We wish to use the tremendous potential of AI to offer them an equal opportunity to access a key resource - healthcare. Through our research, we recognized that communication indeed was a major problem in healthcare. On one hand we have earnest doctors having difficulties trying to understand the patient’s circumstance and on the other we have patients struggling to communicate with what they are most comfortable with - sign language. Additionally, we recognize the tremendous cost that people using sign language have to bear as a result of wanting to bridge the communication barrier – a whopping $80 flat fee per hour. This is not accessible to everyone given that a doctor’s consultation costs only an average of $30 in total. We believe we can do better

What it does

Our model does inference on the hand signs performed by the user. Through the inference, it will interpret the hand sign and display the correct word in the live feed and plays the audio of the word text. This has 2 important purposes, to ensure that the correct message is interpreted on the users side, as well as to display the text that is communicated by the user to the doctor.

How we built it

We use Mediapipe to perform the first level of hand sign interpretation. It identifies the different landmarks and key points and converts observed hand movements to a machine interpretable numpy format. We then perform training on a 3 layer LSTM network, to ensure light computation, necessary for a low compute power edge device such as our RPI. In terms of cloud, we deployed to the edge device using AWS IOT Greengrass. We have used Lambda functions to implement the serverless pipeline for deployment of the model after it is trained. The functions are used in conjunction with AWS EventBridge, S3 buckets and AWS CodePipeline to trigger deployments.

A MLOPS pipeline has also been proposed, to allow us to perform continuous integration and continuous deployment. This is important for us to complete the process of model retraining and monitoring which is an integral part of live machine learning deployment process. This pipeline taps upon our existing AWS Greengrass deployment pipeline.

Challenges we ran into

Given the short span of the competition, it would be difficult implement a model from the ground up on our own. Hence we required the use mediapipe APIto perform the bulk of handsign interpretation. Additionally, the time span also reduced our ability to produce our own training data for training. We hence had to reduce our dataset which is our vocabulary to only those words used in our conversation. The inference speed of the RPI was another issue, since the RPIhas a weak compute power, we sought to use a small LSTM model to perform inference.

Accomplishments

We are proud to have trained a model that is able to perform our sign language inference accurately. The different stages of the pipeline, including deployment to the edge device(RPI) has been tested and working.

What we learned

Through this process we have learnt about how to use the Mediapipe, and the use of LSTMs to perform quick inference of images, rather than the use of the regular CNNs. Most importantly this competition has given us the opportunity to learn more about social problems that unfortunately face different members of society today, which can very well tap into the power of AI.

What's next for BeyondML 2.0

We see big potential in our idea. There are far greater uses of such technology in increasing accessibility to people who require the use of sign language to communicate in their day to day lives, to give them equal opportunities where they may be excluded currently.

We recognize that sign language is more than just hand signs, and incorporates the use of facial gestures, and other body gestures to express not just content, but also emotion in speech as well as accenting. This is significant, as it dramatically improves their involvement in society, through increasing the types of roles they can perform, on top of accessibility to services. With our idea, we wish to empower more involvement in community with inclusion, because we truly believe that we are limiting ourselves when we have such a intelligent group of individuals who may not be receiving an equal platform at sharing their invaluable thoughts and opinions with the rest of the world. We shall close this section with the quote that inspired us by Rollo May, “Communication leads to community, that is, to understanding, intimacy and mutual valuing.”