In this module, you will explore a data warehouse software called Apache Hive.
Apache Hive was first developed at Facebook so that people without advanced programming knowledge could query data using an SQL like language. You can read more details about this in the initial research paper written by the Facebook Data Infrastructure team here. In this module, you will learn the applications and use cases of the Hive, its basic architecture, and how to write Hive queries and conduct data analysis. While we will give you an introduction to Hive architecture, we won’t go too deep into that topic, as we want to focus on how to use Hive to derive intelligence out of data. Ok, let’s begin.
Following is an overview of what this module will cover:
- What is Hive?
- Features of Hive
- Hive Architecture
- A comparative analysis of RDBMS and Hive.
- Basic Hive queries:
- Creation of database and tables.
- Types of tables: Internal & External tables.
- Complex data types.
- Ordering
- Indexing
- Advanced Hive queries:
- Partitioning
- Bucketing
- Joins
- Analysing Amazon Review Dataset using Hive.
Guidelines for in-module questions
There will be a separate session for Graded Questions. The other sessions will contain questions which will not be graded. The graded questions in this module will each have 10 marks for a correct response and 0 for an incorrect response. Each graded question will have only one attempt while the non-graded question may have one or two attempts depending upon the question type and the number of options.
People you will hear from in this module
Adjunct Faculty
Vihwa Mohan
Senior Software Engineer
Vishwa has about 10 years of experience working with multiple MNCs such as Walmart, PayPal and Oracle. He holds a bachelor’s degree and a master’s degree from IIT BHU, one of the premier institutes of India.
Joydeep Sen Sarma
Creator of Apache Hive
Co-founder and CTO, Qubole
Shakun Gupta
Lead Developer at a Fortune 500 company
Shakun works as a Lead Developer in the Big Data Division of a leading company in the Finance domain. He has a Bachelor’s Degree from IIT Delhi and has 12+ years of experience in the Tech Industry. In the past, Shakun has also started his own company – Slassy.
Kautuk Pandey
Senior Data Engineer
Kautuk is currently working as a senior data engineer. He has over 9 years of experience in the IT industry and has worked for several companies. He has deep knowledge of the various tools and technologies that are in use today.
Presenter
Sandeep Thilakan