Inspiration
Amidst applying for internships, it gets hard to keep track of what mails you sent out and what important job related emails are sitting in your inbox or maybe... even in your spam or trash! Even professors may have a hard time keeping track of what student they haven't responded to yet. To work around this we came up with the idea of our project.
What it does
It sorts emails into related categories and makes it easier for you to keep track of important emails.
How we built it
We use the gmail API to extract inbox content. We then preprocess this textual data by lowering case, removing stop words, lemmatization etc. We then vectorize this using TfidfVectorizer and pre-built gensim's text models. Then, we run agglomerative clustering to obtain clusters and use vectorization again to assign labels to those clusters. The sorting is complete! Furthermore, we used Flask to build a development server initially to test things locally, which we then followed by actually deploying the server.
Challenges we ran into
Navigating the gmail API and figuring the object structures was confusing. Exploring and figuring out preprocessing methods and vectorization, took time. Agglomerative clustering was relatively simple but we needed additional brainstorming to optimize the model and test it thoroughly. We also had to figure out a way to assign labels to the results of agglomerative clustering.
Accomplishments that we're proud of
Actually being able to put it all together and come up with a more or less complete product. We are able to steadily tackle all the challenges we faced.
What we learned
How to deploy a server. Natural Language processing and vectorization.
What's next for Sort The Mess
Make changes to the Gmail interface itself possibly and enable user entered categories. Allow the users to control more parameters like how many emails they want to read, how far in the history they want to look, etc.
Log in or sign up for Devpost to join the conversation.