IKH

Comprehension – OOB (Out-of-Bag) Error

In the last segment, you learnt that the OOB (out-of-bag) error is almost as good as the cross-validation error. The final prediction is the aggregation of all the predictions of individual decision trees. Remember that each tree in a random forest is trained on a random subset of the training set, which is called a bootstrapped sample. This means that for each sample (observation), there are several trees that did not include that sample, and for these trees, this sample is unseen. Let’s understand this better.

Suppose there are N = 100 observations with M = 15 features, and the outcome variable is a categorical variable Y. Also, you build a random forest with 50 trees. The OOB is calculated as follows:

For each observation Ni, Ni is passed to all the trees that did not have it in their training. These trees then predict the class of Ni. The final prediction for Ni is decided by a majority vote.

Now let’s apply this to N1. Suppose 10 trees did not have N1 in their training. So these 10 trees make their prediction for N1. Let’s say four trees predicted 0, and the other six predicted 1 as the output. The final prediction for N1 will be 1.

Next, we move on to N2. Suppose 15 trees did not have N2 in their training. So these 15 trees make their prediction for N2. Let’s say 12 predicted 0, and the remaining three trees predicted 1 as the ouput. The final prediction for N2 will be 0.

This is done for each observation in the training set. Once all the predictions for each observation are calculated, the OOB error is calculated as the number of observations predicted wrongly as a proportion of the total number of observations.

Report an error