CreditShield

CreditShield!
This is simulating a real-time transaction and the model predicting if this transaction may be fraudulent.
Example purchases of fraud ranging low to high chance!
Some of the synthetic data we generated for training
Classification report for our AI model
Example of the blockchain, and the transactions.

Inspiration

BNY's challenge was immediately interesting to us, as technology like this is becoming increasingly important. The more that we can integrate things like AI into services that help and protect people, the better off we'll all be.

What it does

The core of our project is the machine learning model itself. After training on a set of synthetic data that we created, it can then examine new transactions in an existing bank account and determine how likely they are to be fraudulent. Along with the model, we also worked on a real time data stream using Apache Kafka. Our solution for opening ledgers utilized blockchain and Ethereum for data integrity, decentralization, and scalability.

How we built it

We broke up different components into subprojects and each person worked on one. These were the machine learning model, synthetic data generation, blockchain integrity, and the real time data stream.

Challenges we ran into

Changing parts of the project and not having figured out everything before starting were the main issues that we ran into. We were communicating as a team and spent lots of time together, but for the first day a lot of what we did was individual work and not all of it came together seamlessly.

Accomplishments that we're proud of

Making sure the model was accurate, and had a strong sense of pattern recognition through the engineering of features that weren't first presented in our dataset. These engineered features include the typical days of the week a person initiates their transaction, their usual times, Z-score between the amount they spend during those typical times, and last, if the company of their transaction actually exists. Creating these relational features and making sure the model training doesn't over-fit to the training set was extremely rewarding and contributed to more accurate prediction!

Our other accomplishments include having a good looking user interface, and a complex system for generating the synthetic data.

What we learned

Some of the main lessons we took away were that having a clear plan and knowing what the final product should look like are both incredibly important for fast paced projects like this.

As far as new programming skills, we got more experience with machine learning, our first exposure to synthetic data generation, OpenAI API, Apache Kafka, and blockchain.

What's next for CreditShield

We're happy with how our project came out in the end and for now don't have more to add, but we'll certainly all be revisiting ideas that we learned more about while working on it.

Built With

ai
apache
blockchain
ethereum
javascript
kafka
mongodb
numpy
openai
pandas
python
react
sklearn
solidity
tailwind
zookeeper

Submitted to

Knight Hacks VII
- Winner BNY Melon: AI-powered Financial Fraud detection System

Created by

I worked on synthetic data generation, which we used to train our machine learning model. It was my first experience with the concept of synthetic data, and I enjoyed it quite a bit. Lots of work went into making sure that the data that my program was creating was realistic and had patterns for the model to pick up on. The event and project were tons of fun and I hope to work with machine learning and synthetic data generation again soon.

Garrison Scarboro
I wrote the ethereum network transaction system + database, and accumulated our verified company database with fuzzy searching! Our eth network utilizes a public ledger for recording all transactions for a company then a bank can utilize an off chain database to store details about transaction info and another collection for total transactions, company risk score, number of charges marked fraudulent. Super fun stuff and I’d love to learn more about it moving forward!!

Camilo Alvarez-Velez
I worked mainly on the machine learning model itself. The model focuses on features such as the amount spent, the typical days and times a user makes purchases, and I even engineered even more features that weren't present in our dataset to build out a more accurate relation between the patterns of a person's transaction history. It was a ton of work, but extremely rewarding. It was a lot of fun and leaves me wanting to learn even more about machine learning!

Parker Blume
Darren Bansil