So far, you have understood how logistic regression works and how performance measures can be evaluated. Now that you’ve acquired the theoretical knowledge of this technique, let’s move on to its application since it is equally important. So, let’s understand the various applications of logistic regression in different business scenarios across multiple industries from our industry expert Hindol Basu.
In general, logistic regression by definition tries to predict what state a particular individual or system will be in the future. You learnt about the two types of logistic regression.
- Binary logit
- Multinomial logit
Binary logit involves two levels of the dependent variable. For example, the telecom churn example you learnt in earlier sessions is a binary logistic regression problem, as it classifies customers into two levels, churns and non-churns. Multinomial logit, however, involves more than 2 levels of dependent variables, such as whether a customer will purchase product A, product B or not purchase anything.
So, the rule of thumb for deciding whether the problem is a binary classification problem or multinomial classification problem is that you should first understand the dependent variable.
Let’s see some more examples and try to understand the importance of logistic regression in detail.
To summarise, logistic regression is a widely used technique in various types of industries. This is because of two main reasons.
- It is very easy to understand and offers an intuitive explanation of the variables
- The output (i.e. the probabilities) has a linear relationship with the log of odds, which can be very useful for explaining results to managers
Also, recall that Hindol mentioned something called model scores. In an earlier session though, you learnt that a logistic regression model gives log odds as output. So, to understand what scores are, let’s go back to the telecom churn example from earlier sessions.
customerID | Probability | Odds | Log Odds | Score |
8773-HHUOZ | 0.084 | 0.092 | -2.389 | 331 |
8865-TNMNX | 0.257 | 0.346 | -1.062 | 369 |
9867-JCZSP | 0.297 | 0.422 | -0.862 | 375 |
9420-LOJKX | 0.435 | 0.770 | -0.261 | 392 |
6234-RAAPL | 0.439 | 0.783 | -0.245 | 393 |
7760-OYPDY | 0.443 | 0.795 | -0.229 | 393 |
8012-SOUDQ | 0.446 | 0.805 | -0.217 | 394 |
3413-BMNZE | 0.461 | 0.855 | -0.156 | 395 |
6575-SUVOI | 0.688 | 2.205 | 0.791 | 423 |
6388-TABGU | 0.753 | 3.049 | 1.115 | 432 |
You must have noticed the column called score. Basically, it’s a different way of reporting your findings. Earlier, you saw that log odds make more sense as the output instead of probabilities because of their linear relationship with the variables. However, log odds have weird values, such as -0.245, -0.156 etc., which is not a very elegant form of output.
Hence, instead of reporting the log odds as output, you can report scores.
Score is calculated using the following expression:
Score=400+(20∗log(odds)/log(2))
This expression is decided based on business understanding. You could come up with your own expression for the score, one that converts log odds into a more presentable form.
So, in this lecture, you understood the uses and importance of logistic regression. Next, we’ll go through some nuances of logistic regression.