Why SMART-Google-Forms (Inspiration/Problem Statement)
- Various Grading Technologies exist for Simple Forms with MCQs (Objective Answer Type), but there are no such known Technologies that can Grade Long (Subjective Answer Type) Questions.
- Teachers often spend many hours Grading Student Response Sheets, and this can be Automated to provide Assistive Technology for better Education.
What it does (Proposed Solution)
- With Smart-Google-Forms Teachers don't have to check each Answer, rather they can provide a generalized Answer for Questions and our Natural Language Processing (NLP) based Predictions will Score them on the basis of the Sample Answers provided by the Teacher.
- Saves Time, Easy to Use, User Friendly Platform that provides a UI similar to the popular Google Forms.
How we built it (Technological Stack Used)
- FrontEnd :
CSS,HTML,VanillaJavaScript - Backend :
Python,Flask,SQLLite3,Natural Language ToolKit (NLTK),GenSim,Rapid Automatic Keyword Extraction (RAKE)
Challenges we ran into (Problems Faced)
- Creating a robust UI that was exactly similar to Google-Forms was an amazingly difficult task.
- Score Prediction Model Retraining was a Primary Scalability Concern.
What we learned (Learning Outcomes)
- Advanced Natural Language Processing
- Web Application Scalability
- Responsive System Design
ML workflow:
The whole ML workflow is devided into two parts, one is ML/DL prediction and another is Similarity score estimation. Reason behind using ML into this is to capture the propper construction and the grammatical flow of sentences. And by using tf-idf based similarity score we get the contextual information from the given text. As its mentioned above that the the teacher will provide a sample answer and with respect to that we will be evaluating the answers of the students. Now, lets discuss both of the parts separately,
ML/DL prediction:
we had 8.5k+ data points to train on, so data points were much lesser because of that LSTM model was performing poorly as val_loss was coming 56% in average. So we moved from DL to ML. In ML we ried with random forest, random forest with RandomizedSearchCV so by using that we got upto 60% accurecy although is not a good metric to follow incase of classification so we tried with kappa score too which was giving us ~0.92 kappa value which is really a good score.
tf-idf based similarity score:
TF-IDF approach
- Make a text corpus containing all words of documents . You have to use tokenisation and stop word removal . NLTK library provides all .
- Convert the documents into tf-idf vectors .
- Find the cosine-similarity between them or any new document for similarity measure.
ML file structure:
All the files lies into the ML area folder.
LSTM_56_percent.ipynbhere LSTM model git trained with embedding layer.- autograding-using-lstm-tf-keras_high_kappa.ipynb here lstm MODEL WAS TRAINED WITH 5 FOLD CV and normal word vectors.this is the file which shows the high kappa score.
- model building.ipynb this file shows the training of random forest and random forest with
RandomizedSearchCV. - similarity score estimation.ipynb this file gives the similarity score.
- get_final_score.py implements the a simple average score out of all other models and the similarity score.
What's next for SMART Google Forms (Future Prospectives)
- Google Forms API Integration for Open Access
- OAuth Login (Integrated with Google, GitHub etc)
- More Complex Neural Network Predictive Analysis
- Advanced Analytics for Project Results
Built With
- css
- database
- flask
- gensim`
- html
- html`
- natural-language-processing
- python
- sqllite3`
- tensorflow
- vanillajavascript`-backend-:-`python`
- vanillajs
Log in or sign up for Devpost to join the conversation.