-
A Reddit forum about US news, over the last week of September
-
A Reddit forum about college applications, over Nov 2018 - Dec 2018:
-
A Reddit forum about privacy, over Nov 2018 - Dec 2018:
-
A Reddit forum about the United Kingdom, over the last 2 months of 2018
-
A Reddit forum about computer science career advice, over last 2 months of 2018
-
A Reddit forum about world news, over the last two months of 2018
Inspiration
- We've always been curious about how data mining can be used exactly to derive meaningful results, so we decided to build an application that performs analysis on data gathered from reddit
What it does
- It extracts topics of reddit submissions of a specific subreddit forum within a specific period of time
- Then, it performs frequency distribution analysis on the keywords extracted
- Finally, it displays the frequency distribution of top 50 keywords in the form of a word cloud
How I built it
- frontend: html,js, ZingChart
- backend: flask, PSAW, nltk
Challenges I ran into
- We were new to data mining, a lot of time has been spent on researching for the suitable resources for us to begin with.
- the efficiency of data extraction process is heavily dependent on the popularity of the subreddit forum in search
Accomplishments that I'm proud of
- Able to identify top keywords for any subreddit over a period of time
What I learned
- Basic NLP techniques
- Extracting large sets of data
What's next for RedditSays
- Visualise the movement of trends across time
- Plot geospatial heatmaps showing which parts of the world (roughly) discuss certain topics
- Add more features to make frontend interactive
Log in or sign up for Devpost to join the conversation.