A Specialised Architecture for Visual Data

Convolutional Neural Networks, or CNNs, are specialised architectures which work particularly well with visual data, i.e. images and videos. They have been largely responsible for revolutionalizing ‘deep learning’ by setting new benchmarks for many image processing tasks that were very recently considered extremely hard.

Let’s start by understanding some common challenges in image processing.

Challenges in image processing

Let’s consider the common task of visual recognition (like identifying a ‘cat’ or a ‘dog’) – trivial as it is for humans, it is still a big challenge for algorithms. Let’s look at some of the challenges:

Viewpoint variation: Different orientations of the image with respect to the camera.
Scale variation: Different sizes of the object with respect to the image size.
Illumination conditions: Illumination effects.
Background clutter: Varying backgrounds.

CNNs – A specialised architecture for visual data

Although the vanilla neural networks (MLPs) can learn extremely complex functions, their architecture does not exploit what we know about how the brain reads and processes images. For this reason, although MLPs are successful in solving many complex problems, they haven’t been able to achieve any major breakthroughs in the image processing domain.

On the other hand, the architecture of CNNs uses many of the working principles of the animal visual system and thus they have been able to achieve extraordinary results in image-related learning tasks.

The ImageNet challenge

CNNs had first demonstrated their extraordinary performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The ILSVRC uses a list of about 1000 image categories or ‘classes’ and has about 1.2 million training images. The original challenge is an image classification task.

Let’s have the professor introduce you to the challenge and the role CNNs have played in it.

In the image given below, you can see the impressive results of CNNs in the ILSVRC where they now outperform humans (having a 5% error rate). The error rate of the ResNet, a recent variant in the CNN family, is close to 3%.

In the next segment, you will study the different ways in which CNNs are used for image-processing tasks.

Report an error