In the previous modules on machine learning, you may have come across situations where a model performs well on training data but not on the test data. Also, you would have faced confusion about which model to use for a given problem. For example, given a problem that requires classification, how would you decide which model to go with? Questions like these frequently arise irrespective of the choice of model, data or the problem itself. The aim of this session is to answer questions like these.
So far you have learnt some of the regression and classification models like linear regression, logistic regression, decision trees and random forests. But apart from these, you must have also heard about some other models like Support Vector Machine (SVM), k-Nearest Neighbors, Naive Bayes and so on. You can read more about these models from the additional reading given at the bottom of the page.
Let’s now look into the following video to understand more about model selection and the considerations you should keep in mind to find the best fit model which tackles a given problem well.
As Prof. Raghavan mentioned, in these sessions you will be learning some thumb rules and some general pointers about how to go about selecting the appropriate models for a given problem. This session is just a discussion on such thumb rules. In the subsequent sessions, you will see these rules being applied in the context of various problems and algorithms. The central issue in all of the machine learning is “how do we extrapolate learnings from a finite amount of available data to all possible inputs ‘of the same kind’?” Training data is always finite, yet the model is supposed to learn everything about the task at hand from it and perform well on unseen data.
Recall the bike-sharing dataset in linear regression which was trained using a few thousand observations. How do you ensure, and be confident, that the model is as good as it seems on the training data and deploy it to make predictions on real, unseen data?
Often, it is mistaken that if a model performs well on the training data, it will produce good results on test data as well. But that is not always the case. Let’s understand this better from the following video.
Suppose you built a model to detect whether an email is a spam or ham. Now how do you ensure that the model that you have built will do a good job on unseen data as well? Your client is definitely not going to implement it on the data that he already provided you to build the model. Here comes the real test when your model might encounter instances about which it did not learn from the training data. Even in such cases, it is expected to make right predictions when tested on new unseen data and that is what a good model is.
Let’s look into the next video to understand how models extract generalisable information from a finite amount of data it is trained on to perform well on unseen data.
A model can be a function, logical rule or a data structure that takes a set of inputs, processes them and gives out an output. It should not be either too simple or too complex to be able to make predictions. You need to strike the right balance between the two to come up with a good model that is capable of making right predictions even on unseen data.
Occam’s razor is perhaps the most important thumb rule in machine learning, and incredibly ‘simple’ at the same time. When in dilemma, choose the simpler model. The question then is ‘how do we define simplicity?‘. Let’s understand this from the following video.
So as you could see from the above video, definition of simplicity varies with respect to the type of model under consideration. Simplicity in case of a tree model would mean a reduced depth or size but when it comes to a linear model, it can be expressed in terms of the number of attributes required to represent a model.
Those who want to explore more about the different terminologies associated with model selection like model class, learning algorithm, hypothesis and hypothesis class can access the additional resources provided in Session 3 (Additional resources) of this module.
In the next segment, you will study some objective ways to measure model simplicity and understand why simplicity is preferred over sophistication and complexity using various examples.