a predictive model for assessing loan risk using machine learning techniques and data analysis. By analyzing historical loan data, we can identify patterns and factors that contribute to loan defaults, enabling financial institutions to make informed lending decisions.
In simple terms, banks nowadays need to have enough reserves to survive economic and financial shocks. It requires banks to calculate expected loss which is defined as:
where:
For each of those components I've built a seprate model, combined them together and then calculate the expected loss (
A gradient boosted tree model learner takes in the dataset and it is trained by setting a target column (in this case, column Risk).
LGD valuse is based on collateral and financial situation of every borrower. If the financial situation is more stable, the lower the LGD. But because on bigger datasets there is much more data (much more elements to add to LGD) I had to add some noise to the data so that it would look more realistic:
lgd_val = base_lgd + np.random.normal( 0, 0.05 )Will be always 0 as the data is from when the loans were accepted or not
Reports from this dataset show, that bank's total expected loss is about 500-600k:

