In our earlier discussion on the experiments by Hubel and Wiesel, we had observed the following statement:
- The strength of the response (of the retinal neurons) is proportional to the summation over the excitatory region.
After extracting features (as feature maps), CNNs typically aggregate these features using the pooling layer. Let’s see how the pooling layer works and how it is useful in extracting higher-level features.
Pooling tries to figure out whether a particular region in the image has the feature we are interested in or not. It essentially looks at larger regions (having multiple patches) of the image and captures an aggregate statistic (max, average etc.) of each region. In other words, it makes the network invariant to local transformations.
The two most popular aggregate functions used in pooling are ‘max’ and ‘average’. The intuition behind these are as follows:
- Max pooling: If any one of the patches says something strongly about the presence of a certain feature, then the pooling layer counts that feature as ‘detected’.
- Average pooling: If one patch says something very firmly but the other ones disagree, the pooling layer takes the average to find out.

Let’s now look at an example of max pooling and understand some potential drawbacks of the pooling operation.
Let’s summarise the example of pooling used in the lecture:
In the above figure, you can observe that only the width and height of the input reduces. Let’s extend this pooling operation to multiple feature maps:
You can observe that pooling operates on each feature map independently. It reduces the size (width and height) of each feature map, but the number of feature maps remains constant.
Pooling has the advantage of making the representation more compact by reducing the spatial size (height and width) of the feature maps, thereby reducing the number of parameters to be learnt. On the other hand, it also loses a lot of information, which is often considered a potential disadvantage. Having said that, pooling has empirically proven to improve the performance of most deep CNNs.
Can we design a network without pooling? Capsule networks were designed to address some of these potential drawbacks of the conventional CNN architecture. The paper on Capsule networks is provided below.

In the next segment, we will summarise all the concepts discussed till now.
Additional Reading
- The paper on ‘Capsule Networks’.
Report an error