Introduction to Transfer Learning

So far, we have discussed multiple CNN based networks which were trained on millions of images of various classes. The ImageNet dataset itself has about 1.2 million images of 1000 classes.

However, what these models have ‘learnt’ is not confined to the ImageNet dataset (or a classification problem). In an earlier session, we had discussed that CNNs are basically feature-extractors, i.e. the convolutional layers learn a representation of an image, which can then used for any task such as classification, object detection, etc.

This implies that the models trained on the ImageNet challenge have learnt to extract features from a wide range of images. Can we then transfer this knowledge to solve some other problems as well?

Thus, transfer learning is the practice of reusing the skills learnt from solving one problem to learn to solve a new, related problem. Before diving into how to do transfer learning, let’s first look at some practical reasons to do transfer learning in the first place.

To summarise, some practical reasons to use transfer learning are as follows:

Data abundance in one task and data crunch in another related task,
Enough data available for training, but lack of computational resources.

An example of the first case is this – say you want to build a model (to be used in a driverless car to be driven in India) to classify ‘objects’ such as a pedestrian, a tree, a traffic signal, etc. Now, let’s say you don’t have enough labelled training data from Indian roads, but you can find a similar dataset from an American city. You can try training the model on the American dataset, take those learned weights, and then train further on the smaller Indian dataset.

Examples of the second use case are more common – say you want to train a model to classify 1000 classes, but don’t have the infrastructure required. You can simply pick up a trained VGG or ResNet and train it a little more on your limited infrastructure. You will implement such a task in Keras shortly.

In the next segment, we will see some other use cases where we can use transfer learning.

Report an error