In this segment, you will learn how to analyse two numerical variables using the Bank marketing dataset. Now, there are multiple tools to analyse numerical variables. In the next video, you will learn about the different tools and plots that are helpful for extracting insights using numerical variables from a data set.
Note
In the video, at 3:08 mins, the value of the correlation between Petal Length and Sepal Width is -0.43 and not 0.43 at both the places of the correlation matrix.
One very important concept that has been covered in the video above is that of correlation coefficient. Now, correlation coefficient depicts only a linear relationship between numerical variables. It does not depict any other relationship between variables. A zero correlation does not imply that there is no relation between variables; it merely indicates that there will no linear relationship between them. Also, there can be a negative or positive correlation between variables. A negative correlation means that if the value of one variable increases, the value of another decreases, whereas it is the opposite for a positive correlation.
Now, the higher the coefficient of correlation between numerical variables, the higher the linear relation between them.
From the correlation matrix below, you can observe that petal length has a high correlation with sepal length, with a correlation coefficient of 0.87. Also, there is a very high correlation coefficient of 0.96 between petal width and petal length.
Note
The value of the correlation between Petal Length and Sepal Width is -0.43 and not 0.43 at both the places of the below correlation matrix.
However, the correlation matrix has its own limitations where you cannot see the exact distribution of a variable with another numeric variable. To solve this problem, we use pair plots. Pair plots are scatter plots of all numeric variables in a data set. It shows the exact variation of one variable with respect to the others. You can observe how one variable is varying with respect to another in the image below.
Now, in the following video, Rahim will explain how to perform a numeric bivariate analysis using the bank marketing dataset.
So, in the video, you saw how a pair plot can help you determine that there is no correlation between the ‘age’, ‘balance’ and ‘salary’ variables. Now, refer to the image below and observe how there is no correlation between these variables.
A high correlation coefficient does not imply that there will be a correlation with another numeric variable every time because there can be no causation between them. There may be cases where you will see a high correlation coefficient between two variables but there is no relation between them. You will understand this in detail in the next segment that how correlation is related to the causation.
Comprehension: Correlation
Consider the following four scatter plots of two variables A and B.
Based on your learning in this segment, answer the following questions.