So far, you have learnt about the various principles of model selection and feature engineering techniques. Now, let’s consolidate all your learnings so far and build a machine learning model using the various techniques for model evaluation, feature selection and hyperparameter tuning techniques to choose the best model that produces the best prediction results for our business problem.
Before going into the actual model building exercise, let’s understand the broad level project pipeline on how you should approach a given business problem. The project pipeline can be briefly summarized in the following five steps:
- Data Understanding: Here, you need to load the data and understand the features present in it.
- Exploratory data analytics (EDA): Normally, in this step, you need to perform univariate and bivariate analyses to check the data distribution, missing values or any outliers present in the data followed by feature transformations, if necessary. You need to check if there is any skewness in the data and try to mitigate it, as it might cause problems during the model-building phase. Can you think why skewness can be an issue while modelling? Well, some of the data points in a skewed distribution towards the tail may act as outliers for the machine learning models which are sensitive to outliers and hence that may cause a problem. Also, if the values of any independent feature are skewed, depending on the model, skewness may affect model assumptions or may impair the interpretation of feature importance.
- Train/Test Split: Now you are familiar with the train/test split, which you can perform in order to check the performance of your models with unseen data. Here, for validation, you can use the k-fold cross-validation method. You need to choose an appropriate k value so that the minority class is correctly represented in the test folds.
- Feature Engineering: This helps you in extracting hidden insights and the underlying structure of data by modifying, combining, extracting or selecting features that are less prone to overfitting. This would help you choose the features that you will be using to train your model.
- Model-Building and Hyperparameter Tuning: This is the final step at which you can try different models and fine-tune their hyperparameters until you get the desired level of performance using model evaluation techniques.