In the previous segment, while doing convolutions, each time we computed the element-wise product of the filter with the image, we had moved the filter by exactly one pixel (both horizontally and vertically). But that is not the only way to do convolutions – you can move the filter by an arbitrary number of pixels. This is the concept of stride.
Let’s study strides in a little more detail. The notion of strides will also introduce us to another important concept – padding.
You saw that there is nothing sacrosanct about the stride length 1. If you think that you do not need many fine-grained features for your task, you can use a higher stride length (2 or more).
You also saw that you cannot convolve all images with just any combination of filter and stride length. For example, you cannot convolve a (4, 4) image with a (3, 3) filter using a stride of 2. Similarly, you cannot convolve a (5, 5) image with a (2, 2) filter and a stride of 2 (try and convince yourself).
To solve this problem, you use the concept of padding.
Padding
The following are the two most common ways to do padding:
- Populating the dummy row/columns with the pixel values at the edges
- Populating the dummy row/columns with zeros (zero-padding)
Notation:
Padding of ‘x’ means that ‘x units’ of rows/columns are added all around the image.
An alternate (less commonly used) way to do convolution is to shrink the filter size as you hit the edges.
You may have noticed that when you convolve an image without padding (using any filter size), the output size is smaller than the image (i.e. the output ‘shrinks’). For e.g. when you convolve a (6, 6) image with a (3, 3) filter and stride of 1, you get an output of (4, 4).
If you want to maintain the same size, you can use padding. Let’s see how padding maintains the image size.
You saw that doing convolutions without padding reduces the output size. It is important to note that only the width and height decrease (not the depth) when you convolve without padding. The depth of the output depends on the number of filters used – we will discuss this in a later segment.
Why Padding is Necessary?
You saw that doing convolutions without padding will ‘shrink’ the output. For e.g. convolving a (6, 6) image with a (3, 3) filter and stride of 1 gives a (4, 4) output. Further, convolving the (4, 4) output with a (3, 3) filter will give a (2, 2) output. The size has reduced from (6, 6) to (2, 2) in just two convolutions. Large CNNs have tens (or even hundreds) of such convolutional layers (recall VGGNet), so we will be incurring massive ‘information loss’ as we build deeper networks!
This is one of the main reasons padding is important – it helps maintain the size of the output arrays and avoid information loss. Of course, in many layers, you actually want to shrink the output (as shown below), but in many others, you maintain the size of the output.
Until now, you have been computing the output size (using the input image size, padding and stride length) manually. In the next segment, you will learn generic formulas which will help reduce some of the manual work that you have been doing.
Report an error