Let’s understand the problem statement and the dataset you will be using in this session.
Problem Statement
For this demonstration, you will use the bank marketing data set. So, let’s try and understand the problem statement to utilise the information available in the best possible way and proceed in the right direction as per the business problem at hand.
So, a bank ran a marketing campaign in the past and has obtained data pertaining to nearly 11,000 customers, which includes variables such as their age, jobs, bank balance, education, loan status and so on. Based on this data, the bank wants to develop its future strategies based on the insights that it drew from the previous campaign and improve for the next campaign so that more customers agree to open term deposits with the bank.
Hence, ‘deposit’ is the target variable here. A ‘Yes’ in the ‘deposit’ column indicates that the campaign was successful and the customer agreed to open a term deposit account with the bank. In contrast, a ‘No’ in the ‘deposit’ column indicates that the campaign was not very successful and the customer could not be convinced to open a term deposit account.
Essentially, the bank wants to:
- Build a model that quantitatively relates to the success of the marketing campaign with variables such as job, marital status, education, bank balance, etc.
- Identify the features of the data set that affect the successful conversion of customers.
- To know the accuracy of the model, i.e., how well these variables predict the success of the campaign.
You may download the data set and the Python notebook below. We recommend that you open the file on your computer and follow along with the demonstrations in the videos; this will help you understand the model building process easily and quickly.
Let’s begin by understanding the data set, which consists of different variables obtained from the previous marketing campaign conducted by the bank.
So, now that you have an understanding of the data, we will next proceed to the Jupyter notebook to understand the basic exploratory data analysis (EDA) and the preprocessing of the data, before moving on to the actual model building and selection part. This will help you in interpreting the data well and in identifying the variables that can prove to be useful in building the model.
So, in the video, we imported all the required libraries and read the data set. Also, you now have an idea of the categorical variables present in the data set, along with how the values are distributed in each of these variables. So, let us proceed ahead and get an understanding of the numerical features present in the data set next.
So, now that you have an understanding of both the categorical and numerical variables present in the data set, and have also visualised the same, we will next move ahead and treat the categorical variables in order to proceed with the model building.
So, in the video, you saw how we mapped the binary features into 0s and 1s, and handled other categorical features with more than two labels using dummy encoding. This was a simple yet important step before model building. Now, let’s proceed ahead and work around the initial steps of model building.