ClusterShield

Inspiration

One of our team members recently got a suspicious charge to their bank account (from a scam company based in Singapore). Luckily, it was marked as suspicious by the bank and the transaction never went through, but this got us thinking: how much fraud slips through the cracks? Thus, our mission statement and goal: to improve fraud detection through a more comprehensive algorithm.

What it does

ClusterShield is a cutting-edge system designed to detect banking fraud through the implementation of k-neighbors supervised machine learning, along with a weighted sum of multiple models (K-NN, logistic regression, decision tree, and random forest). Leveraging this powerful algorithm, the system analyzes patterns and anomalies in financial transactions, swiftly identifying potential fraudulent behavior and safeguarding the integrity of banking operations.

How we built it

The development of ClusterShield involved a comprehensive process that integrated data science, machine learning, and software engineering. We gathered and curated a diverse dataset of banking transactions to train our k-neighbors algorithm using Python, and created an interactive website through HTML/CSS.

Challenges we ran into

Our main challenges were working with data analysis and understanding the different methods of machine learning in our models. Working with web development (visit our website: link)went slightly smoother, but we still learned a lot from the process.

What we learned

Our team is composed of four freshmen at UM with varying skill levels. None of our team had experience with ML before this project, 3/4 had never been to a hackathon before MHacks, and one member had never implemented a large project before. However, all of us grew and learned from developing ClusterShield, whether it be ML experience, web development, or business proposal pitching.

What's next for ClusterShield

We see multiple next steps for further development, including integrating the model with our website, deploying it to the cloud, training our algorithm on more data sets, and adding more models to our algorithm to make it more robust.