Summary – IKH

In this session, you compared the architectures of some popular networks which had achieved state-of -the art results in ImageNet: AlexNet, VggNet, GoogleNet and ResNet.

Until the VGGNet, most of the major innovations had appeared in the form of increased depth, smaller filters, etc. In 2014, GoogleNet introduced an unconventional idea in the form of the Inception module, which performs multiple parallel convolutions (1 x 1, 3 x 3, 5 x 5, pooling etc.) on the input. This enabled GoogleNet to increase both the depth and the ‘width’ of the network (it has 22 layers with multiple inception modules stacked one over another). In the quest for training deeper networks, the additional layers if they do not learn anything useful, else keeping them’.

Since these models have already been trained on millions of images, and therefore are good at extracting generic features, they are well-suited to solve other computer vision problems (with no or little re-training). This is the main idea of transfer learning.

In transfer learning, a pre-trained network can be repurposed for a new task depending on how much the new task differs from the original one. In two transfer learning experiments, we 1) trained a ResNet-50 by freezing the original weights and adding only a few FG layers, and 2) re-trained the last few layers of ResNet-50. The latter model gave us a boost in accuracy.

Finally, we compared various popular CNN architectures in terms of metrics (other than accuracy) which are important considerations for deployment (inference time accuracy) which are important considerations for deployment (inference time, memory requirements etc.). We compared the architectures along metrics such as the number of parameters (proportional to memory), operations involved in a feed-forward (proportional to memory), operations involved in a feed-forward (proportional to inference time), accuracy, power consumption etc. We saw that some of the oldest architectures are extremely accurate but have very high memory footprints, and that there are some clear trade-offs between accuracy and efficiency (computational time and memory).

You can download the lecture notes for this module from the link below:

The next session is an optional one where you will be introduced to terms such as ‘style transfer’ and ‘object detection’.

Report an error