IKH

Introduction: Inferential Statistics 7

Welcome to the module on ‘Inferential Statistics’. In the last module on EDA, you learnt how to explore data and derive insights from its exploration.

In this module

Exploratory data analysis helped you understand how to discover patterns in data using various techniques and approaches. As you learnt, EDA is one of the most important parts of the data analysis process. It is also the part on which data analysts spend most of their time.

However, sometimes, you may require a very large amount of data for your analysis, which may need too much time and resources to acquire. In such situations, you are forced to work with a smaller sample of the data instead of having to work with the entire data.

Situations like these arise all the time at big companies like Amazon. For example, let’s say the Amazon QC department wants to know what proportion of the products in its warehouses are defective. Instead of going through all of its products (which would be a lot!), the Amazon QC team can just check a small sample of 1,000 products and then find, for this sample, the defect rate (i.e., the proportion of defective products). Then, based on this sample’s defect rate, the team can ‘infer what the defect rate is for all the products in the warehouses.

This process of ‘inferring’ insights from sample data is called ‘inferential statistics’.

Note that even after using inferential statistics, you will arrive at only an estimate of the population data from the sample data, not the exact values. This is because when you don’t have the exact data, you can only make reasonable estimates about it with a limited level of certainty. Therefore, when certainty is limited, we talk in terms of probability and in the first session of this module we will explain to you the basics concepts of probability which are useful and important in inferential statistics.

In this session

In this session, you will learn the basic concepts of probability and the various rules associated with it. The broad agenda of the session covers the following:

  • Permutation and combination.
  • Definition of probability and its properties.
  • Key terms related to probability.
  • Probability rules (Addition and Multiplication).

Guidelines for in-module questions

The in-video and in-content questions for this module are not graded. Note that graded questions are given on a separate page labelled ‘Graded Questions’ at the end of each session. These questions will adhere to the following guidelines:

First Attempt Marks First Attempt Marks
Questions with 2 Attempts 10 5
Questions with 1 Attempt 10 0

People you will hear from in this module

Subject Matter Expert

Tricha Anjali

Associate Professor, IIIT-B

The International Institute of Information Technology, Bangalore, also known as IIIT-B, is one of India’s foremost graduate schools. Through its Integrated M.Tech., M.Tech., M.S. (Research) and PhD programs in the IT space, it focuses equally on innovation and education.

Amit Gupta

Data Scientist, Microsoft

Amit, with over 11 years of experience, is proficient in business analytics, product management, marketing and sales, and has technical experience in enterprise software and IT services. Microsoft is a multinational technology company that develops, manufactures, licenses, supports and sells computer software, consumer electronics, personal computers and related services. He is right now exploring the world of Quantum Computing and part of Quantum Study Group at Garage Microsoft (Hyderabad Campus).

Reference e-book

Statistical Inference for Data Science by Brian Caffo

Note

 In this module, some interactive graphics have been sourced from the Seeing Theory project of Brown University.  

The Seeing Theory is a project designed and created by Daniel Kunin with support from Brown University’s Royce Fellowship Program. The goal of the project is to make statistics more accessible to a wider range of students through interactive visualisations. 

Report an error