Inspiration
We wanted to take a data set and apply it to a predictive model in figuring out how students should perform for the coming semester. This idea came from our group taking a data mining course this semester and wanting to take our knowledge and apply it to a real world example.
What it does
Our software takes the data, puts it into data frames, and creates two separate very useful predictive tools. The first, we use the data to find interesting rules and patterns within the data. Creating association rules like these help find likely outcomes for new students in the system. The other method we used was clustering. We used 2 clusters representing the student predicted to pass or fail.
How I built it
We used python, pandas, and numpy to build data frames. Using these data frames, we built clusters and a confusion matrix, we also built frequent itemsets and interesting rules using a defined confidence threshold.
Challenges I ran into
Pruning the data set to reduce the amount of noise was fairly difficult. The hardest part of data mining is determining what data is hurting your predictions more than helping. Getting a good grasp on the entire data set was crucial in helping us fully understand the data and how to setup our models.
Accomplishments that I'm proud of
We were able to create multiple predictive models. Additionally, we formed a handful of histographs and other visual tools to create a truly exception data analysis presentation.
What I learned
We learned that real life data examples are much harder to understand and use than classroom examples. The data can often be muddled and hard to read until the noise has been reduced.
What's next for FailSafe
Ideally, we would like to train the model better with more data. Another useful feature we could include would take in current class grades and how many points have been scored in the class already to figure out how much of a challenge the student has of passing at their current point.
Log in or sign up for Devpost to join the conversation.