Until about 2014 (when the GoogleNet was introduced), the most significant improvements in deep learning had appeared in the form of increased network depth – from the AlexNet (8 layers) to GoogleNet (22 layers). Some other networks with around 30 layers were also introduced around that time.
Driven by the significance of depth, a team of researchers asked the question: Is learning better networks as easy as stacking more layers?
The team experimented with substantially deeper networks (with hundreds of layers) and found some counterintuitive results (shown below). In one of the experiments, they found that a 56-layered convolutional net had a higher training (and test) error than a 20-layered net on the CIFAR-10 dataset.
Analyse the results in the plot above and list down at least 1-2 possible explanations for them.
The team found that the results are not because of overfitting. If that were the case, the deeper net would have achieved much lower training error rate, while the test error would have been high.
What could then explain these results? Let’s find out.
Thus, the key motivator for the ResNet architecture was the observation that, empirically, adding more layers was not improving the results monotonically. This was counterintuitive because a network with n + 1 layers should be able to learn at least what a network with n layers could learn, plus something more.
The ResNet team (Kaiming He et al) came up with a novel architecture with skip connections which enabled them to train networks as deep as 152 layers. The ResNet achieved groundbreaking results across several competitions – a 3.57% error rate on the ImageNet and the first position in many other ILSVRC and COCO object detection competitions.
Let’s look at the basic mechanism, the skip connections or residual connections, which enabled the training of very deep networks.
Thus, the skip connection mechanism was the key feature of the ResNet which enabled the training of very deep networks. Some other key features of the ResNet are summarised below. You are also encouraged to read the detailed results in the ResNet paper provided at the bottom of this page:
- ILSVRC’15 classification winner (3.57% top 5 error)
- 152 layer model for ImageNet
- Has other variants also (with 35, 50, 101 layers)
- Every ‘residual block’ has two 3×3 convolution layers
- No FC layer, except one last 1000 FC softmax layer for classification
- Global average pooling layer after the last convolution
- Batch Normalization after every convolution layer
- SGD + momentum (0.9)
- No dropout used
In the next few segments, you will learn how to use these large pre-trained networks to solve your own deep learning problems using the principles of transfer learning.
Report an error