Inspiration

We think that discrimination in banking results in the poor treatment of POC and members of redlined neighborhoods regarded as poor, making it a significant issue we face today in Baltimore. The use of modern technology in banking could help to minimize poor human judgement, especially how our societal finance systems marginalize people of color. Only such technology can democratize the financial system and remove many inequalities currently present.

This is a major influence on our ideal pathway of economics and has a major impact on our homes and personal lives. Bowen, our team's data analyst, has a Haitian father who recently had a lawsuit with Wells Fargo regarding this pertinent discrimination. An established business man, he was denied access to the bank due to personal judgements of the way he looked and of the poorer community that the bank was situated.

Bankers tend to succumb to fallacies from personal bias instead of statistics, and they do not have the same confidence for members of poorer, minority communities as they do in others. The issue of redlining is persistent across the nation, but its effects are apparent to us in Baltimore. We are silenced from the efforts of gerrymandering; we are split between hierarchies of socioeconomic status; we are being invaded by authorities using imminent domains such as the Johns Hopkins campus expansions to overextend their reach further into the city in the effort to control.

This generalization to the members of these marginalized communities is unfair. If someone is deserving or undeserving of a loan, financial backing, or a withdrawal, it should be based on personal merit instead of one's personal demographic. Our background as being members of a diverse Baltimore Polytechnic High school community, including students from almost all of the zip codes within Baltimore City, is an inspiration to make our futures more equatable in the socioeconomic realm.

Project Summary

We developed a loan application predictor that takes inputs of users' general financial and demographical data, resulting in an estimation of the likelihood of receiving a loan approval from a given bank of choice. Using the generated model, we were able to draw several conclusions of the biases of the loan application process and visually represent them on our website.

Users provide general information for their state, county, income, age, gender, loan amount, approximate income, approximate debt, and bank of choice. They receive a general prediction to their loan application success and are welcome to tune the data to see other results.

The Algorithm

We used SKLearn to analyze information from the publicly available Home Mortgage Disclosure Act datasets, creating a model that could categorize loan applications as successful or unsuccessful. After validating and normalizing the data, a machine learning classification model would be generated to predict results from a user's given data.

Utilizing the Taipy module, we were able to host our model on a website with intuitive and accessible interactive tools and visually display data that we received from the model.

Challenges

Our datasets, which would often reach the hundred thousands of individual applicants, raised early questions about the efficiency of our model. Some banks opted not to share information about their client's demographics, and most banks did not have a comparable financial metric like a credit score to empirically compare applications with.

In the combined dataframe, successful applicants outnumbered unsuccessful applicants at a ratio of 4:1. This imbalanced dataset resulted in a skewed and overfitted model, which outputted significant amounts of false positives and made it difficult to communicate to users if their loan application would be denied.

Accomplishments

We decided to primarily use categorical variables in the effort to combine similar applicants, making the training process much smoother and faster than the traditional method. Using census data about credit scores and the metrics that we had, such as zip code and income, we were able to get a rough estimate such that we could accurately compare a user's financial profile with common successful applications.

Through the use of typical machine learning classification models, we were receiving low F1 scores of 0.0 to 0.2 for the unsuccessful applicants. We were able to research further methods to combat this imbalanced dataset and implemented the use of Adaptive Boosting Classifiers. ABCs iterated the classification process on minority points such that they were able to be more accurately identified despite not being represented as much in the training set. Through this change, we were able to raise our F1 scores to 0.6 to 0.7 for unsuccessful applicants and 0.9 for successful applicants.

Learning Moments

The key social takeaway from our experience was that the inherent biases and issues with our current systems made it more complicated to enact proper studies, provide useful tools, and make change in our communities. When banks are hesitant to report many unsuccessful loan applications, use varying confidential credit scoring systems, and hide their demographic data, it becomes harder to evaluate their objectivity and offer advice to poorer communities about where and how to apply to loans. The use of modern technologies such as our algorithm can revolutionize the way that we approach these situations by filling the gaps taken away from us.

Technically, we all learned how to train a machine learning algorithm and develop a website in Python as beginner hackers. We're incredibly proud of our work and excited to see where it can go next.

What's Next

We look to further applications in API support with the datasets to automatically refresh our models once updated data arrives from these banks, streamlining the process from model to user. Through linking this process to blockchain and crypto wallets, we would be able to completely digitize the loan application process and make it much easier for the users to apply seamlessly.

In order to have a more cohesive recommendation for our users, it would be helpful to have access to proprietary databases like Equifax. These include additional variables, such as income, demographics (i.e. zip code, race, gender, age), success rate of the loan request, and credit score. The hidden nature of this data behind a paywall is evidence of the difficulties of analyzing these institutions' objectivity.

By incorporating credit score we can effectively compare our system of acceptances and rejections for mortgage loans with different banks, allowing us to prescribe users with the bank they are predicted to have the best success rate at. Our model, with a larger variety of data, would become more reliable and could be used as a basis for online banking in the future.

Share this project:

Updates