IKH

Revisiting Bar Graphs and Box Plots

In the earlier sessions, you learnt how bar graphs and box plots can be utilised for analysing the numerical and categorical variables in your data. Now, you’ll learn some additiona customisations that Seaborn provides along with certain use cases where those functionalities come in handy. For this demonstration, you’ll be taking a look at the Content Rating module.

Since taking just the average did not give us any insight, we decided to use the median metric. Here, you observed that the median value also did not prove to be a good differentiator for the categories being analysed.

Now, this is where you utilised Seaborn’s estimator function to create bar graphs for different metrics (other than the median and mean) as you did earlier. In this case, you used the value at the 5th percentile to compare the categories and utilised the following estimator function for it:

Here, you can see some clear differences popping up: “Everyone 10+” has the highest rating at the 5th percentile (3.5), followed by “Teen” (around 3.3) and then “Everyone” & “Mature 17+”(around 3).

Using the estimator function, you can observe the values at different percentiles and compare the different categories. 

Now, you must be wondering, rather than observing at specific percentiles, why not visualise the entire spread of ratings for each category using a box plot? Well, if you did , then good job! You’re thinking in the right direction. Rahim will be discussing that in the next video.

Note

The SME mistakenly calls the fences as hinges at 2:17 and 2:31

The following is the box plot of ratings for all different categories:

Here, you get a bird’s eye view of the spread of ratings for the different categories: median, 75th percentiles, fences, etc. The immediate insight that you obtained from the above view are:

  • That “Everyone” category has the highest number of ratings in the lower percentiles as compared to the other categories.
  • The median values are all comparable, which was discovered in the previous views as well.
  • The upper fences for all the categories get capped at 5.0, whereas there are some observable differences in the lower fences.

Comparing Box Plots

As you saw in the above visualisation, comparing box plots of a particular measure for different categories helps you analyse the consistency and difference in spread between all the given variables. The IQR or the inter-quartile range serves a very useful purpose here in doing the same. Here’s a video that explains how to compare different box plots to determine the most consistent performance.

In the next segment, let’s learn to plot heat maps. Heat maps help visualise the values in a matrix by colour coding them based on their values. 

Additional Notes

  • In the first use case of box plots, you observed how they can be used to identify and remove outliers from the data. In this segment, you understood how box plots can enable you to analyse a numerical variable across several categories. These two are the most prominent use cases of box plots that you’ll be encountering from time to time as you proceed in this program.
  • As you saw in the video, utilising the groupby function, the bar graph can be used to compare the mean, median, sum and several other metrics.

Report an error