IKH

Understanding the Business Problem Statement

So far, you have studied all the concepts around the multi-classification techniques One vs One and One vs Rest. You learnt about the logic that governs these classifications and understood how predictions are made using the Sigmoid operator and probabilities. It is time to apply all the knowledge that you have acquired to build a real model for a business problem using Python.

In the next video, Ankit will introduce you to the business problem that you will be solving in this session.

In this case study for a multinational bank, there are different business units in a bank. One of the units that we are concerned with is the loan unit. The department is facing major losses, and they want to explore the cause of losses. They have conducted a root cause analysis and found ‘default’ to be one of the main reasons for losses. In order to prevent such losses in the future, they need to identify at the time of loan processing the loan requests that are likely to default. So, the bank needs to build a credit risk model that will screen all the incoming new loans for the credit risk. The outcome of the model for each loan request is a classification out of these three: low risk, medium risk and high risk.

Using this prediction, the loan officer may take a call on whether to approve or reject the loan request. Refer to the definitions of default and credit risk provided below.
Default loans : Defaulting refers to the failure to repay a loan according to the terms agreed to in the promissory note. For most federal student loans, you will default if you have not made a payment in more than 270 days.
Credit risk : This refers to the risk that a bank takes when lending money to borrowers, who might default and cause losses to the bank.

Open this IPython Notebook and code along with Ankit for hands-on practice of the code.

As Ankit explained, before starting to build the model, you should have a clearly defined objective. The business objective as stated in the video is as follows:

  • ‘To build the ‘Credit risk estimate model’ to classify new loans availed as low risk, high risk, and medium risk. This will help the bank to sanction loans to ‘low’ and ‘medium risk’ customers  and reject the loan approval for ‘high risk’ customers.’

After stating the objective, Ankit started with importing NumPy and Pandas, following which he imported the data set. You can import the data set from the link that was provided to you earlier in this segment. The data set contains 38,642 entries with columns as id, loan_amnt, funded_amnt, int_rate, installment, emp_length, annual_inc and the target column loan_status.

The meaning of each column is defined as follows:

  • id: Transaction ID use to identify each transaction uniquely
  • loan_amnt: Loan amount that was requested by the customer
  • funded_amnt: Amount that was sanctioned by the bank
  • int_rate: Interest rate offered on the loan amount
  • installment: Amount of money paid during each installment
  • emp_length: Work experience (employment length of the customer)
  • annual_inc: The customer’s annual income 
  • loan_status: Classification of a loan as ’high risk’, ’low risk’ or ’medium risk’

This data set is a mapping of the approval time record and the current status of existing old borrowers. After building the model, we can predict the future loan status of current applicants using their current details, i.e., approval time record.

Then, as a good practice, Ankit looked at a few data samples followed by the distribution of loan status. The distribution of loan status is as follows:

  • Low risk: 32145
  • High risk: 5399
  • Medium risk: 1098

The data was already in a cleaned state as Ankit had checked. While building such models in real life too, you will get this data in a cleaned state because the data would already have undergone preliminary and exploratory analyses. This is also the reason that we know that there exists a relationship between these feature variables and the target.

Report an error