On the previous page, we trained the model on a small chunk of the dataset and confirmed that the model can learn from the dataset (indicated by overfitting). After fixing the model and data augmentation, we now need to find the learning rate for the optimiser (here SGD). Data Augmentation is discussed in the next video. Let’s now tune some hyperparameters of the model.
Here, we tuned the learning rate hyperparameter and observed that a rate of o.1 is the optimal learning rate when compared to o.o1 and o.oo1. However, you know that using such a high learning rate for the entire training process is not a good idea since the loss may start to oscillate around the minima later. So, at the start of the training, we use a high learning rate for the model to learn fast, but as we train further and proceed towards the minima, we decrease the learning rate gradually.
Keras Callbacks
Also, we have used callbacks to store the loss history at the end of every epoch. Using callbacks is an important technique to save results etc. of the model during specific points in training (such as the end of an epoch, start of a batch, etc.) Please refer to the section Hyperparameter Tuning and answer the following questions.
Next, let’s see how to implement learning rate decay. Since we want to decay the learning rate at the end of every epoch, using callbacks is a natural decision.
Note: In the following video, at [00:54], Rohit mentions ‘0.5’. This should actually be ‘0.05’
Here, the learning rate is set to decrease after every epoch as the dataset is small. Typical values can be reducing the learning rate by a factor of 0.5 after every 5-10 epochs or by a factor of 0.1 every 20 epochs. Note that these numbers can differ significantly depending on the model and problem. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates (validation loss).
You also learnt that a sample of original image data is picked from the directory and an equal number of augmented data is generated by ImageDataGenerator (using keras method) by applying random transformations to original images. The original and the augmented images are combined to form 2x data and used for training. Rohit also pointed that it is better to use AUG curve rather than accuracy metrics, and especially if the dataset is skewed like in the case of the medical dataset where you have fewer samples of patients who have actually the disease.
In the next segment, you will learn to train and evaluate the model.
Report an error