Inspiration

Throughout this challenge, we worked towards detecting fraud cases from a database of transactions and clients data. Finding a good precision-recall tradeoff is a crucial point, as the bank faces important charges in case of declaring an honest client as fraudulous (false positives), and in case of overlooking fraudulous clients (false negatives). This first version is applied to bank frauds but it can be generalized to other domains (insurance...etc)

What it does

Detects fraud cases from a database of transactions and clients data. This first version is applied to bank frauds but it can be generalized to other domains (insurance...etc)

How we built it

We used Random Forest classifier, and validated the underlying parameters by performing a grid search. A local test set has been used to validate our performance.

Challenges we ran into

One of the most challenging applications areas of anomaly detection is fraud detection, and that is due to the huge unbalancing of the positives cases due to the negatives ones (~9000 positive examples out of 1 million). Luckily enough, the use of Random Forests helps a lot in avoiding overfitting, but as stated before, a wise choice of the precision-recall tradeoff (which is a proxy to the number of fraudulous clients to be declared) remains the main difficulty.

Built With

Share this project:

Updates