IKH

Introduction to CNNs

Let’s dig a little deeper into CNN architectures now. In this segment, we will analyse the architecture of a popular CNN called VGGNet. Observing the VGGNet architecture will give you a high-level overview of the common types of CNN layers before you study each one of them in detail.

To summarise, there are three main concepts you will study in CNNs:

  • Convolution, and why it ‘shrinks’ the size of the input image
  • Pooling layers
  • Feature maps

The VGGNet architecture is shown below.

The VGGNet was specially designed for the ImageNet challenge which is a classification task with 1000 categories. Thus, the softmax layer at the end has 1000 categories. The blue layers are the convolutional layers while the yellow ones are pooling layers. You will study each one of them shortly.

Finally, the green layer is a fully connected layer with 4096 neurons, the output from which is a vector of size 4096.

The most important point to notice is that the network acts as a feature extractor for images. For example, the CNN above extracts a 4096-dimensional feature vector representing each input image. In this case, the feature vector is fed to a softmax layer for classification, but you can use the feature vector to do other tasks as well (such as video analysis, object detection, image segmentation etc.).

Next, you will see how one can do video analysis using the feature vector extracted by the network.

Report an error