So far you learnt about sensitivity and specificity. You learnt how these metrics are defined, why they are important, and how to calculate them. Now, apart from sensitivity and specificity, there are two more metrics that are widely used in the industry which you should know about. They’re known as ‘Precision’ and ‘Recall’. Now, these metrics are very similar to sensitivity and specificity; it’s just that knowing the exact terminologies can be helpful as both of these pairs of metrics are often used in the industry. So let’s first hear Rahim introduce these metrics.
Note: At 5:12, professor said at lower threshold precision is high and recall is low ,it should be low precision and high recall.
Let’s go through the definitions of precision and recall once again:
- Precision: Probability that a predicted ‘Yes’ is actually a ‘Yes’.
True/Predlcted | No | Yes |
No | TN | FP |
Yes | FN | TP |
The formula for precision can be given as:
Precision=TP/TP+FP
Remember that ‘Precision’ is the same as the ‘Positive Predictive Value’ that you learnt about earlier. From now on, we will call it precision.
- Recall: Probability that an actual ‘Yes’ case is predicted correctly.
True/Predicative | No | Yes |
No | TN | FP |
Yes | FN | TP |
The formula for recall can be given as:
Recall=TPTP+FN
Remember that ‘Recall’ is exactly the same as sensitivity. Don’t get confused between these.
You might be wondering, if these are almost the same, then why even study them separately? The main reason behind this is that in the industry, some businesses follow the ‘Sensitivity-Specificity’ view and some other businesses follow the ‘Precision-Recall’ view and hence, will be helpful for you if you know both these standard pairs of metrics.
Now, let’s check the precision and recall in code as well.
As Rahim said, whatever view you select might give you different interpretations for the same model. It is completely up to you which view you choose to take while building a logistic regression model.
Now, recall that Rahim had told that similar to sensitivity and specificity, there is a trade-off between precision and recall as well.
So similar to the sensitivity-specificity tradeoff, you learnt that there is a tradeoff between precision and recall as well. Following is the tradeoff curve that you plotted:
As you can see, the curve is similar to what you got for sensitivity and specificity. Except now, the curve for precision is quite jumpy towards the end. This is because the denominator of precision, i.e. (TP+FP) is not constant as these are the predicted values of 1s. And because the predicted values can swing wildly, you get a very jumpy curve.