IKH

Summary

Here’s a brief summary of what you learnt in this session:

  1. Machine learning models can be classified into the following two categories on the basis of the learning algorithm:
  1. Supervised learning method: Past data with labels is available to build the model.
  • Regression: The output variable is continuous in nature.
  • Classification: The output variable is categorical in nature.

2. Unsupervised learning method: Past data with labels is not available.

  • Clustering: There is no predefined notion of labels.

2. Past dataset is divided into two parts in the supervised learning method:

  • Training data is used for the model to learn during modelling.
  • Testing data is used by the trained model for prediction and model evaluation.

3. Linear regression models can be classified into two types depending upon the number of independent variables:

  • Simple linear regression: This is used when the number of independent variables is 1.
  • Multiple linear regression: This is used when the number of independent variables is more than 1.

4. The equation of the best fit regression line Y = β₀ + β₁X can be found by minimising the cost function (RSS in this case, using the ordinary least squares method), which is done using the following two methods:

  • Differentiation
  • Gradient descent 

5. The strength of a linear regression model is mainly explained by R², where R² = 1 – (RSS/TSS).

  • RSS: Residual sum of squares.
  • TSS: Total sum of squares.

6. RSE helps in measuring the lack of fit of a model on a given data. The closeness of the estimated regression coefficients to the true ones can be estimated using RSE. It is related to RSS by the formula: RSE=√RSSdf, where df=n−2 and n is the number of data points.

Additional Reading:

Report an error