In the last segment, you learnt that the **OOB (out-of-bag) error** is almost as good as the **cross-validation error**. The final prediction is the **aggregation** of all the predictions of individual decision trees. Remember that each tree in a random forest is trained on a **random subset **of the training set, which is called a **bootstrapped sample**. This means that for each sample (observation), there are several trees that did not include that sample, and for these trees, this sample **is unseen**. Let’s understand this better.

Suppose there are N **=** **100 observations** with M **= 15 features**, and the outcome variable is a categorical variable Y. Also, you build a random forest with **50 trees**. The OOB is calculated as follows:

For each observation Ni, Ni is passed to all the trees that did not have it in their training. These trees then predict the class of Ni. The final prediction for Ni is decided by a majority vote.

Now let’s apply this to N1. Suppose 10 trees did not have N1 in their training. So these 10 trees make their prediction for N1. Let’s say four trees predicted 0, and the other six predicted 1 as the output. The final prediction for N1 will be 1.

Next, we move on to N2. Suppose 15 trees did not have N2 in their training. So these 15 trees make their prediction for N2. Let’s say 12 predicted 0, and the remaining three trees predicted 1 as the ouput. The final prediction for N2 will be 0.

This is done for each observation in the training set. Once all the predictions for each observation are calculated, the OOB error is calculated as the number of observations predicted wrongly as a proportion of the total number of observations.