Now, let’s learn about public data sources and the techniques to extract data from them. Public data is available on various platforms on the internet. Such platforms can be any open websites such as government websites or any online learning websites, some of which we will discuss here. 

Let’s hear Anand as he talks about the kinds of websites and sources available to us to access public data.

So, now you have an idea of how public platforms can be a good source of data.  Let’s quickly discuss some interesting platforms that are helpful to explore data analytics and machine learning fields.

  • Kaggle: It is a subsidiary of Google LLC. It is an online community of data scientists and machine learning engineers, where you can find and publish data sets and explore your own developed solutions on an open web-based environment. Kaggle also organises several machine learning competitions online. Here is the link to the Kaggle website.
  • UCI Repository of Machine Learning: It is an online community of data science and machine learning engineers. As the name suggests, it is a repository of data sets that are openly available. You can find interesting case studies to explore data analytics and machine learning.  Here is the link to UCI Repository of Machine Learning website.

Given below are the links to some public data sets. You may explore these open sources to get the data.

GitHub:Awesome public data setsGithub data sets

Open government data set: Open Government data

In the next segment, you will learn about web scraping and understand how you can fetch data from websites using code.


