Inspiration
Throughout this challenge, we worked towards detecting fraud cases from a database of transactions and clients data. Finding a good precision-recall tradeoff is a crucial point, as the bank faces important charges in case of declaring an honest client as fraudulous (false positives), and in case of overlooking fraudulous clients (false negatives). This first version is applied to bank frauds but it can be generalized to other domains (insurance...etc)
What it does
Detects fraud cases from a database of transactions and clients data. This first version is applied to bank frauds but it can be generalized to other domains (insurance...etc)
How we built it
We used Random Forest classifier, and validated the underlying parameters by performing a grid search. A local test set has been used to validate our performance.
Challenges we ran into
One of the most challenging applications areas of anomaly detection is fraud detection, and that is due to the huge unbalancing of the positives cases due to the negatives ones (~9000 positive examples out of 1 million). Luckily enough, the use of Random Forests helps a lot in avoiding overfitting, but as stated before, a wise choice of the precision-recall tradeoff (which is a proxy to the number of fraudulous clients to be declared) remains the main difficulty.
Built With
- machine-learning
- matplotlib
- pandas
- python
- scikit-learn
- seaborn

Log in or sign up for Devpost to join the conversation.