In the earlier visualisations, you’re dealing only with numeric variables. Now you’ll step into analysing the categorical variables and see how the Ratings vary across each of them. Note that in the case of categorical variables, you need to use aggregates or measures like sum, average and median to plot the visualisations. And then use plots like a bar chart or pie chart to portray those relationships. They are as follows:
You’re already familiar with how to create bar plots in matplotlib. Here, you’ll see how you can create bar plots and pie charts directly from the pandas series as well. Go through the documentation for both pie charts and bar plots. You’ll also be doing a couple of data handling tasks here. So let’s dive in.
Note
Some functions and plots in seaborn have deprecated in the latest version. Therefore you may not get exactly the same visualisation as the ones shown in the video.
So after the data handling tasks, you went ahead and plotted the total number of records in each category of Content Rating using the pie chart and the bar graph in matplotlib.
You understood the reasons why a pie chart is not very much preferred in cases where there are 3 more categories to be plotted. Essentially, it is very difficult to assess the difference between the different categories when their proportions are pretty similar as seen in the following pie chart:
However, this problem is easily overcome with the bar graph, where there are clear visual cues with the length of the bars that portray the difference between the categories succinctly. In fact, you can draw a horizontal bar graph as well to make the difference much more apparent. Both the views are shown in the images below
You can clearly see that ‘Everyone’ category has the highest number of apps followed by Teen and Mature 17+.
Additional Notes
Here’s a blogpost describing how Steve Jobs used pie charts and other visualisations cunningly to show a “different picture” than the real one.