In the EDA exercise, it is very important to note that although some numerical variables can sometimes be highly correlated to each other, there may not be a cause of any relationship between them.
Let’s first listen to Rahim in the next video and try to get a holistic picture of correlation among variables using some compelling examples.
So, the major takeaway from the video is that correlation does not imply causation. In the video, you saw that the number of people who drowned by falling into a pool is not related to movies starring Nicolas Cage. However, if you observe the plot below, you will notice that there is a very high correlation between them, as both the plots follow almost the same path.
Now, in the example below, it is quite obvious that the per capita cheese consumption has no relation with people dying from being tangled in bed sheets, although the plot shows a high relation between them.
For more such compelling examples, where causation and correlation are not related to each other, you can refer to this link.
In this way, you have now a clearer idea that how causation is different from correlation. In the next segment, you will learn about bivariate analysis using numerical and categorical variables.