IKH

Putting the Components Together

You have now studied all the main components of a typical CNN – convolutions, feature maps, pooling layers etc. Let’s now quickly summarise and put them together to get an overall picture of CNNs’ architecture. 

To summarise, a typical CNN layer (or unit) involves the following two components in sequence:

  1. We start with an original image and do convolutions using multiple filters to get multiple feature maps.
  2. A pooling layer takes the statistical aggregate of the feature maps.

Typically, deep CNNs have multiple such CNN units (i.e. feature map-pooling pairs) arranged sequentially. The following lecture will discuss this in detail.

To summarise, a typical CNN has the following sequence of CNN layers:

  1. We have an input image which is convolved using multiple filters to create multiple feature maps.
  2. Each feature map, of size (c, c), is pooled to generate a (c/2, c/2) output (for a standard 2 x 2 pooling). 
  3. The above pattern is called a CNN layer or unit. Multiple such CNN layers are stacked on top of one another to create deep CNN networks.

Note that pooling reduces only the height and the width of a feature map, not the depth (i.e. the number of channels). For example, if you have m feature maps each of size (c, c), the pooling operation will produce m outputs each of size (c/2, c/2).

Report an error