Let’s recap the steps we’ve taken up to now:
- Pre-processed the data:
1. Normalisation
2. Augmentation
2. Build a network:
1. Decide on an architecture (resnet-18)
2. Run a data generator
Now, it’s time to train the model. The first step of model-training is to conduct an ablation experiment. It is done to check whether your code is working as expected. Let’s understand the basic idea of these experiments.
Before moving ahead, answer these questions on ablation experiments to ensure that you understand the key concepts clearly.
Overfitting on Training Data
The next step is trying to overfit the model on the training data. Why would we want to intentionally overfit on our data? Simply put, this will tell us that whether the network is capable of learning the patterns in the training set.
To summarise, a good test of any model is to check whether it can overfit on the training data (i.e. the training loss consistently reduces along epochs). This technique is especially useful in deep learning because most deep learning models are trained on large datasets, and if they are unable to overfit a small version, then they are unlikely to learn from the larger version.
During training, sometimes you may get NaN as a loss instead of some finite number. This denotes that the output value predicted by the model is very high, which results in high loss value. This causes exploding gradient problem. The problem could be a high learning rate and to overcome, you can use SGD Optimiser. Although the adaptive methods (Adam, RMSProp etc.) have better training performance, they can generalize worse than SGD. Furthermore, if you can also play with some of the initialisation techniques like Xavier initialisation, He initialisation etc. and change the architecture of the network if the problem does not get solved.
Additional Reading
- This article lists some common ways in which you can prevent your ML model from overfitting.
- This paper compares various optimisers. You can look at their graphs to know their findings.
Report an error