Previously, you learnt about the bivariate analysis of numerical variables. In this segment, you will learn about the associations between numerical and categorical variables. You will learn how to apply this analysis on the same bank marketing dataset.
So, in the video, you saw how the salary variable is varying with respect to the response variable. Their mean and median are the same, as shown in the image below.
However, a very different picture emerges when you plot a boxplot. The interquartile range for customers who gave a positive response is on the higher salary side. This is actually true, because people who have higher salaries are more likely to invest in term deposits.
Now, in the next video, we will take a look at a different variable in the bank marketing dataset.
In the video, you observed that after the balance versus response graph is plotted, it does not make any sense at first glance. Sometimes only a boxplot is not sufficient to draw insights, because of a high concentration of data and or because of higher values in the data set, for example, the balance variable.
In such cases, it is a good practice to analyse the data using mean, median or quartiles. In the video, you saw that the mean and median values of the balance variable are higher for customers who gave a positive response, which is again true, because people who have higher balance in their bank accounts are more likely to invest in term deposits.
In the next segment, you will get an idea about categorical versus categorical variable analysis.