Inspiration
Learning different languages can be boring when novice have no interaction. We want to make learning fun by making beginners curious about objects in their surrounding.
What it does
LingoSnapz helps first lets users decide which language they want to learn. Next, we prompt the user to click a picture of object(s) they want to test their knowledge on or generate a random image from our dataset. Through our ML based pipeline consisting of pre-trained deep learning models, we detect object's details from the image and generate question about describing it. We then take input from the user, either in typed or in handwritten image form, in any language. Finally, we run a similarity check on user's input and correct answer to award points.
How we built it
We developed LingoSnapz as a web application using React, Figma, and Flask. Our web app allows users to interact with the system and perform the following tasks: 1) Choose the language they want to learn 2) Choose a random or input image 3) Click a picture of an object 4) Enter their answer guess. In the backend, we run pre-trained image detecting model which uses BlipProcessor and tensors to identify and describe object in image. Next, we pass this description to another transformer to generate questions. Finally, the user's guess is compared to correct answer and given a score. Implementing this pipeline involves several python libraries and APIs. The environment setup is simplified and written in ReadMe file.
Challenges we ran into
Similarity check between user's answer and generative AI's answer is sometimes inaccurate because of synonyms and paraphrasing possibilities. It was hard to build accurate deep learning models as multiple objects in background of an image confuses the model in framing quality questions. Developing database storage system like Google Firebase for images was challenging for a real time application, especially IAM permissions
Accomplishments that we're proud of
We are proud of successfully incorporating multiple models and user interactions to provide valid outputs. We tested our application with several images and obtained accurate results, especially for single objects.
What we learned
Building LingoSnapz taught us how to use and incorporate APIs, choose the right datasets, and integrate deep learning models into the backend and React-based frontend for text and image inputs. Additionally, we learned about the challenges of image captioning with deep learning models.
What's next for LingoSnapz
We plan to turn LingoSnapz into a startup and increase user retention and time spent on the app. We aim to create a social media platform where users can upload images of objects in their surroundings and have them described by the worldwide community/friends in different languages.
Built With
- css
- deeplearning
- figma
- flask
- github
- gpt
- html
- huggingface
- imagecaptioning
- javascript
- natural-language-processing
- python
- pytorch
- react
- tesseract
- transformers
Log in or sign up for Devpost to join the conversation.