Module Summary

Exploratory Data Analysis (EDA) helps a data analyst to look beyond the data. It is a never-ending process—the more you explore the data, the more the insights you draw from it.  As a data analyst, almost 80% of your time will be spent understanding data and solving various business problems through EDA. If you understand EDA properly, that will be half the battle won.

Now, one thing that you should keep in mind is that EDA is far more than plain visualisation. It is an end-to-end process to analyse a data set and prepare it for model-building.

In this module, you have learnt about the four most crucial steps in any kind of data analysis. These steps include the following:

  • Gather data for analysis: In the data sourcing part, you learnt about the various sources of data. There are majorly two types of data sources, namely, public data and private data. Private data is associated with some security and privacy concerns, whereas public data is freely available to use without any restrictions on access or usage. There are many websites that provide access public data set available. You have also learnt about the basics of web scraping—a process to fetch the data from a web page directly.
  • Preparation and cleaning of data: In the cleaning process, the main objective is to remove irregularities from a data set. There are many ways to clean data, but the two most important approaches that you learnt as part of the cleaning step are treatment of missing values and outlier handling

Now, there are many ways to deal with missing values, for example, removing an entire column or rows with missing values; however, you need to keep in mind that it should not hamper the data with loss of information. The other method to deal with missing values is to just impute them with other values such as mean, median, mode or quantiles. The third method is to treat the missing values as a separate category; this is the safest method to deal with missing values.

Next, you learnt about the different methods for analysing variables. These methods include the following:

  • Univariate analysis: Univariate analysis involves the analysis of a single variable at a time. Now, there are multiple types of variables, such as categorical ordered and unordered variables, and numerical variables. A univariate analysis gives insights about a single variable and how it varies, and what the counts of each and every category in it are.
  • Bivariate and multivariate analysis: Bivariate/multivariate analysis involves analysing two or more variables at the same time. These analyses yield very specific insights about a data set. You can infer various findings through bivariate analysis.

Report an error