In the past segments, you learnt how to create scatter plots and joint points using matplotlib and seaborn. Using the sns.jointplot() you also saw how to create reg plots that provide regression functionality on the top of the scatter plot and histograms that are already available.
Now, in case there are 4-5 numeric variables that you want to analyse, making a jointplot for every 2 numeric variables is a bit tedious. To overcome this limitation, let’s learn another functionality, the pair plots.
First, you took a subset of the entire dataframe – [Reviews, Size, Price and Rating] for the pairplot. Then you simply used the sns.pairplot() function to plot it. Check its official documentation for understanding its parameters.
As you can see for every two numeric variables, the pairplot creates a scatter-plot whereas in the case for the diagonal ones, where the same variable is being considered twice, a histogram is shown.
Here, you’re able to make certain inferences in conjunction with the ones made in earlier segments, like how Reviews and Price have an inverse relationship as the L-shaped scatter plot represents. Now, compared to the previous jointplot, you observe that the statistical information is a bit less( no Pearson coefficient to explain the correlation between the 2 variables) but nevertheless having a bird’s eye view of all the numeric variables at once has its own advantages.
Here, you’re able to make certain inferences in conjunction with the ones made in earlier segments, like how Reviews and Price have an inverse relationship as the L-shaped scatter plot represents. Now, compared to the previous jointplot, you observe that the statistical information is a bit less( no Pearson coefficient to explain the correlation between the 2 variables) but nevertheless having a bird’s eye view of all the numeric variables at once has its own advantages.
Here, you’re able to make certain inferences in conjunction with the ones made in earlier segments, like how Reviews and Price have an inverse relationship as the L-shaped scatter plot represents. Now, compared to the previous jointplot, you observe that the statistical information is a bit less( no Pearson coefficient to explain the correlation between the 2 variables) but nevertheless having a bird’s eye view of all the numeric variables at once has its own advantages.
Application in Machine Learning
- Pairplots instantly give you the relationship between one numeric variable with the rest of the numeric variables. This is pretty useful in identifying relationships between the target variable and the rest of the features.
- For example, say you want to predict how your company’s sales are affected by budgets allocated to three different types of advertisement channels – TV, Newspaper and Radio. In order to choose, you need to create a pair plot containing profits and the three different budgets as the variables. Let’s say the scatterplots of profits vs the three variables that you obtained from the pair plot are as follows (Click on the image to magnify it):
It is clearly visible that the left-most factor or budget allocated to TV is the most prominently related to the company’s Sales since you can clearly ascertain a trend between them – increase in budgets for TV ads leads to more sales, whereas the points are scattered quite randomly in the latter two cases.
In the next segment, let’s learn ways to add more information to bar graphs.
Note
This concept of utilising pairplots to find insights will be dealt with in detail in future modules on EDA and Regression.