Linear regression is used in various fields such as real estate, telecom, e-commerce, etc. to build predictive models. Let’s look at one such example from the real estate industry. Here, you will predict the price of a house on the basis of some predictor variables, such as floor area, number of bedrooms, parking space, etc.
Problem Statement:
Consider that a real estate company has the data of real estate prices in Delhi. The company wants to optimise the selling price of the properties, based on important factors such as area, bedrooms, parking, etc.
Essentially, the company wants.
- To identify the variables affecting house prices, e.g., area, number of rooms, bathrooms, etc.
- To create a linear model that quantitatively relates house prices with variables, such as the number of rooms, area, number of bathrooms, etc.
- To know the accuracy of the model, i.e. how well do these variables predict the house prices.
Please download the dataset from below.
Please download the python code from below to practice along.
Now that you’ve read and inspected the data, let’s move on to visualising it. This will help in interpreting the data well and identifying the variables that can turn out to be useful in building the model.
That was all about visualising the numerical variables. You might have noticed that there are a few categorical variables present in the dataset as well. Let’s visualise them too, using boxplots.
Coming up
In the next segment, you will do the data preparation, which is an important step before model building.