IKH

Reading Digital Images

Before we dig deeper into the architecture of CNNs, let’s understand what images are and how they are fed into CNNs.

You already know that the input to any neural network should be numeric. Fortunately, images are naturally represented as arrays (or matrices) of numbers. Let’s study the typical structure of images.

To summarize:

  • Images are made up of pixels.
  • A number between 0-255 represents the colour intensity of each pixel.
  • Each pixel in a colour image is an array representing the intensities of red, blue and green. The red, blue and green layers are called channels.
  • In a grayscale image (a ‘black and white’ image), only one number is required to represent the intensity of white. Thus, grayscale images have only one channel.  

Now that you know that images can be represented as numbers, let’s see an example of how one would read images into Python.

You can download the notebook below. 

Let’s summarise the important points. Consider this sample image of a ‘zero’ from the MNIST dataset.

  • The height and width of this image are 18 pixels, so it is stored as an 18 x 18 array
  • Each pixel’s value lies between 0-255
  • The pixels having a value close to 255 appear white (since the pixels represent the intensity of white), and those close to 0 appear black

Let’s see this for a colour image.

  • The height and width of the image are 4 pixels.
  • Here, three numbers make each pixel (representing RGB). So, there are 3 channels here.
  • The size of the matrix is thus 4 x 4 x 3

Note that all colours can be made by mixing red, blue and green at different degrees of “saturation” (0-100% intensity). For example, a pure red pixel has 100% intensity of red, and 0% intensity of blue and green. So, it is represented as (255,0,0). White is the combination of 100% intensity of red, green and blue. So, it is represented as (255,255,255).

Why is the Range of Pixel Values 0-255?

Usually, 8-bits (1 byte) are used to represent each pixel value. Since each bit can be either 0 or 1, 8-bits of information allows for 28=256 possible values. Therefore, the range of each pixel is 0-255.

Let’s now quickly summarise what you have learnt about images.

In the next few segments, you will study the architecture of CNNs in detail. 

Report an error