Welcome to the next half of the module CNN – Industry Applications.
In the next two sessions, you will learn to detect and identify objects in video. You have already learnt how to build image classifiers using Convolutional Neural Networks. Now you will learn to extend that knowledge to video analysis. This session will demonstrate one of the cutting-edge applications of CNNs.
Objectives of this Session
The primary objective of the first session is to understand basic preprocessing techniques using traditional image processing techniques. The basic requirements of deep learning are labelled data and sufficient data volume. For many problems, you may not have either. So for the problems which are not that complex, you can either just use traditional image processing or can also use deep learning to a minimum.
In the previous session, you have learnt techniques like normalisation, data augmentation (using rotation, flipping etc), morphological transformations etc. In this session, we will understand further as you will go through the session. The objective of this session is also to understand how to work with video files. At the heart of it, a video is a series of images in frames, Video analysis typically involves the following steps:
- The video is broken down into its individual frames.
- Each frame is analysed as an individual image (this is mostly achieved by a ‘for’ or’ ‘while’ loop run over the list of frames).
- Often, consecutive frames are ‘compared’ with each other to detect what changed between those frames (a.k.a. movement detection).
So, most of the techniques can also be applied to images. Overall, this session involves two parts.
- Ingesting and processing videos (something new for you to learn!).
- The deep learning section which involves classifying objects, etc. (something that you’re already familiar with).
The deep learning section will follow the same workflow as you have seen until here – indeed, most deep learning applications have the same general workflow. It is the application of deep learning to video analysis that you will pick up from this session.
Let’s get to know our expert, Anand Muglikar, and hear some of his experiences.
Problem statement
The problem statement is to detect and identify vehicles in videos. Suppose you want to identify how many vehicles have passed in a lane during peak hours in a city. The purpose of doing this exercise might be multiple:
- The government can use traffic flow data to decide the width of a new road in a nearby area.
- The organisation who’s building a highway can decide the toll rate based on the number of vehicles passing on a particular road.
- The government often wants to ban certain types of vehicles (such as auto – rickshaws, trucks, etc.) based on the frequency of these vehicles on a particular road.
Broadly speaking, to achieve any of those tasks, there are two steps involved:
- Vehicle detection: Here, you detect those vehicles which are moving on a road.
- Vehicle classification: Here, you classify the detected vehicle into a particular class according to the application you’re working on. For example, if you’re interested in looking at the number of four-wheelers vs the number of two-wheelers, you’d classify each vehicle as a two-wheeler or a four-wheeler. Similarly, you can have classes such as auto-rickshaws, trucks, motorcycles, bicycles, etc. The exact classes need to be defined according to the problem statement.
We will do all the steps involved in vehicle detection using a very popular image processing library – OpenCV. We recommend you to go through the homepage of OpenCV briefly (-5 minutes).
The second step – vehicle classification is fairly straightforward since we’re going to use a CNN classifier for it. After getting yourself familiar with the library, move onto the next segment where we will take a detailed look at the problem statement.
Report an error