If you’ve used MS Excel, then you must have come across conditional formatting, which utilises colour intensities to indicate the magnitude of the numerical value.
Heat maps also utilise the same concept of using colours and colour intensities to visualise a range of values. You must have seen heat maps in cricket or football broadcasts on television to denote the players’ areas of strength and weakness.
Let’s listen to Rahim as he explains how to create a heat map
So, as explained in the video, a heat map can be created as long as you have a rectangular grid of values. For the demonstration, you’ll be seeing how to create a heat map for Ratings/Size/Content Rating.
In the above video, you were introduced to the concept of binning, where you convert a numeric variable to a categorical variable by bucketing a specific range of values. This is pretty useful during analyses where you can create useful buckets and analyse how some other variable changes across those buckets.
One of the most common examples of binning happens in demographic survey datasets (like Census or Market research surveys) that contain the Age column, where people can be categorised as Under-12, 12-17, 18-24 and so on. Despite the actual age of the person being a numeric value, it’s much easier to analyse across buckets and gather insights( like asking how many people in the 12-17 age bucket have gone to school, how many of them prefer a particular brand over the other and so on).
For binning purposes, you utilised the pd.qcut method, which divided the entire Size column to the following buckets on the basis of the percentiles. Note that pd.qcut takes percentile values in decimals, as in 20th percentile becomes 0.2, 40th percentile becomes 0.4 and so on.
The above bins were now used to create the new column called Size_Bucket which stored the binned categories corresponding to the size of each app. Now finally when you prepare the pivot table (corresponding to the aggregation at 20th percentile for ratings), you’ll get a grid as follows..
..which is exactly what you need to create a heatmap!
Now that the pivot table has been created, let’s go ahead and create the heatmap.
Once you’ve created a rectangular grid (either provided or made using the pivot table method taught earlier), use the sns.heatmap() function and pass the grid dataframe as the parameter.Mention some parameters like (cmap = “Greens”, annot=True) to enhance its readability.
The final heat map that you obtained looked like this:
Note
There’s an additional question in the notebook where instead of Content Rating you’ll be analysing Review Buckets using the q.cut approach mentioned above.
Additional Notes:
- Heat maps are predominantly used in machine learning problems to visualise a Correlation Matrix, a grid that shows the correlation between any two quantitative variables. As mentioned in the additional notes of previous segments, understanding the correlation between variables is crucial for building and evaluating any ML model. You’ll learn more about them in the upcoming modules.