IKH

CLT Demonstration: I

In the previous lectures, you were formally introduced to what sampling distributions are and how they possess certain interesting properties which will help us in making inferences about the population parameters from a given sample. In this segment, you’ll verify those properties using a Python demonstration to solidify your understanding of those concepts. Please download the necessary datasets and Jupyter notebook from the link given below:

Note

This code walkthrough is for demonstration purposes only and therefore you will not be evaluated on this. The main aim of this demo is to get you to understand the difference between sample, population, sample means, the sampling distribution and their underlying properties.

For this demonstration, you’ll be using a dataset containing information about NBA players.

First, you’ll be seeing how the population mean and the population standard deviation are related to those of the sampling distribution generated. 

Note

At “1:22” and “4:31” the instructor uses the code df.Weight.std() but it should be df.Weight.std(ddof= 0) as default value for ddof in pandas is 1 and while calculating population standard deviation we divide the numerator with n and not n-1.

As you saw in the video above, the sampling distribution possesses 2 interesting properties that are related to the population parameters. Specifically, you verified that.

  • Sampling distribution’s mean (μ¯X) = Population mean (μ),
  • Sampling distribution’s standard deviation (standard error) = σ√n.

For the above example, we computed the mean weight of all the basketball players that were available in the dataset. This value came out to be 220.67. In the context of this experiment, this value is our population mean.

Now to verify the first property, you picked around 1000 random samples of size 30 from the entire dataset and then calculated the mean of each sample. You plotted the distribution of all these sample means. This is your sampling distribution.

When you computed the mean of this sampling distribution (or in other words, the mean of all the sample means that you had taken earlier) you observed that this value came out to be 220.69. As you can see this value is pretty close to the original population mean of 220.67.

Similarly, when you computed the standard deviation of the sampling distribution, you observed the following relationship.

Sampling Distribution Standard Deviation 

Now that these two properties are verified, we move onto the next property, which is verifying that the sampling distribution would be a normal distribution.

In the above example, you already saw that the sampling distribution was nearly a normal distribution. However, you may wonder that since the original distribution of the population was also normal, therefore this leads to the normal behaviour for the sampling distribution. In order to verify whether it is indeed true or not, take a look at the next lecture.

2479616

Thus, no matter the parent population distribution, when you take samples, compute their means and find the sampling distribution, it will always be normal, or at least nearly normal. This is one of the most important implications of the Central Limit Theorem. The following image summarises the results obtained in the above video.

This concludes the first part of the demonstration. In the second part, you will observe the effect of sample size on the resulting sampling distribution.

Report an error