You have gone through the Air Passenger Traffic problem statement and understood that it is the time series forecasting problem. To solve this problem, we have a set of basic steps of forecasting that are needed to be followed. Let’s quickly understand them from Chiranjoy.
You just learnt about the basic steps involved in any forecasting problem. These were –
- Define the problem
- Collect the data
- Analyze the data
- Build and evaluate the forecast model
The one thing to keep in mind before moving forward is that there are some caveats associated with a time series forecasting. These caveats revolve around the steps you learnt about while defining the problem.
- The Granularity Rule: The more aggregate your forecasts, the more accurate you are in your predictions simply because aggregated data has lesser variance and hence, lesser noise. As a thought experiment, suppose you work at ABC, an online entertainment streaming service, and you want to predict the number of views for a few newly launched TV show in Mumbai for the next one year. Now, would you be more accurate in your predictions if you predicted at the city-level or if you go at an area-level? Obviously, accurately predicting the views from each area might be difficult but when you sum up the number of views for each area and present your final predictions at a city-level, your predictions might be surprisingly accurate. This is because, for some areas, you might have predicted lower views than the actual whereas, for some, the number of predicted views might be higher. And when you sum all of these up, the noise and variance cancel each other out, leaving you with a good prediction. Hence, you should not make predictions at very granular levels.
- The Frequency Rule: This rule tells you to keep updating your forecasts regularly to capture any new information that comes in. Let’s continue with the ABC, an online entertainment streaming service, an example where the problem is to predict the number of views for a newly launched TV show in Mumbai for the next year. Now, if you keep the frequency too low, you might not be able to capture accurately the new information coming in. For example, say, your frequency for updating the forecasts is 3 months. Now, due to the COVID-19 pandemic, the residents may be locked in their homes for around 2-3 months during which the number of views will significantly increase. Now, if the frequency of your forecast is only 3 months, you will not be able to capture the increase in views which may incur significant losses and lead to mismanagement.
- The Horizon Rule: When you have the horizon planned for a large number of months into the future, you are more likely to be accurate in the earlier months as compared to the later ones. Let’s again go back to ABC, an online entertainment streaming service, example. Suppose that the online entertainment streaming service made a prediction for the number of views for the next 6 months in December 2019. Now, it may have been quite accurate for the first two months, but due to the unforeseen COVID-19 situation, the actual number of view in the next couple of months would have been significantly higher than predicted because of everyone staying at home. The farther ahead we go into the future, the more uncertain we are about the forecasts.
Now that you have understood the steps in defining the problem, let’s apply them to the air passenger traffic problem.
- Quantity: Number of passengers
- Granularity: Flights from city A to city B; i.e., flights for a particular route
- Frequency: Monthly
- Horizon: 1 year (12 months)