In this segment, you will understand the mathematical calculations executed internally by the LR models in the One vs Rest classifier. In the previous segment, you saw the probability scores for a test sample as follows:
- 0.1352521804233434
- 0.8646560093263703
- 9.181025028619104e-05
Let’s calculate these scores using the Sigmoid formula. Let’s hear more about this from Ajay.
Let’s understand and summarise each part of the code discussed in the previous video, starting with the code given below.arr = X_test.iloc[0][‘Sacled_features’]
The variable ‘arr’ contains scaled values of all the seven columns (loan_amnt, int_rate, installment, emp_length, annual_inc, fund_perc, incToloan_perc) of a particular test sample. Then we created a function named y and gave its functionalities as shown below:
Let’s understand the code given above. The variable class_cat will represent the specific LR model to be used. So, if it is ‘0’, it will represent the first LR model (created for the high-risk category). Similarly, if it is ‘1’ and ‘2’, it represents the second and third LR model created for low- and medium-risk categories, respectively. As shown in the video, the variable ‘arr’ is passed to the function parameter ‘input_arr’. Hence, input_arr contains the scaled values for each of the seven columns. You already know that we need to multiply each column of an input feature by a weight coefficient. That is what the code given below does.
Since there are seven columns, we put a for loop until range 7 and multiply each column value represented as input_arr[i] by its corresponding weight coefficient represented as Coeff_array[class_cat-1][i]. As stated above, ‘class_cat-1’ represents the specific LR model.
Then, we create another function named ‘z’ as shown below:
In earlier sessions, you learnt that the Sigmoid formula is applied to an input x along with β values, which generates the probability score. This function does the required calculations as it takes a particular input and applies the Sigmoid formula to it.
Then, we called the respective functions created above with specific values passed to the function parameters as shown below:
Note: We will need to pass the test sample to all the LR models, which will generate a probability score, and the class with the highest probability score will be the classified category for the test sample.
Let’s take ‘z(sum(OneVsRes.intercept_[0],y(1,arr)))’ and break it up into small segments to understand better.
- y(1,arr): Here, ‘1’ is passed to the class_cat variable, and in the function, we subtract 1 from it (class_cat-1); so, the value is ‘0’, which represents the LR model for the high-risk category. The output of y is calculated in a way similar to how it was done in the diabetes example, which is β0 * x + β0. So, the equation is as follows:
β1 * Column 1 + β2 * Column 2 + β3 * Column 3 + β4 * Column 4 + β5 * Column 5 + β6 * Column 6 + β7 * Column 7
- sum(OneVsRes.intercept_[0],y(1,arr))): This adds the intercept value (β0) to the result of y function. So, the equation is as follows:
β1*column 1+ β2*column 2+ β3*column 3+β4*column 4+β5*column 5+β6*column 6+β7*column 7+β0
- z(sum(OneVsRes.intercept_[0],y(1,arr))): The z function applies the mathematical expression β1*column 1+ β2*column 2+ β3*column 3+β4*column 4+β5*column 5+β6*column 6+β7*column 7+β0 to the Sigmoid formula. This gives the predicted probability score for the high-risk category.
Similarly, the functions z(sum(OneVsRes.intercept_[1],y(2,arr))) and z(sum(OneVsRes.intercept_[2],y(3,arr))) give the probability scores for low-risk and medium-risk categories, respectively. Then, these three probability scores are normalised to sum up to 1 by executing the code given below.z_norm = z_non_norm/sum(z_non_norm)
When we print the z_norm values, we can see that the values are as follows:
- 0.1352521804233434
- 0.8646560093263703
- 9.181025028619104e-05
These values are an exact match for the values listed at the start of the segment, and the test sample is classified as ‘low risk’. In the next segment, you will learn the Python code for the One vs One classifier.
Report an error