Welcome to the first segment on ‘Introduction to Hive’.
Once you have ingested the data into the Hadoop Distributed File System(HDFS), it becomes available for processing. You already know that the MapReduce programming model is used to process big data.
But writing MapReduce programs is a difficult and time-consuming task and specific skill sets are required to write good quality MapReduce programs. What if you do not know how to write a MapReduce program? Does that mean you can’t work on Hadoop? Not really, there are tools available that can help you with this. Apache Hive was developed to help people work with Big Data without actually needing to spend time learning MapReduce.
In this lecture, Joydeep Sen, who co-authored Hive during his time at Facebook will talk about why they developed Hive.
Hive is a data warehouse software that enables you to query and manipulate data using an SQL-like language known as HiveQL. It was developed at Facebook so that people who had experience in SQL would be able to work on querying datasets without actually learning new paradigms like MapReduce or new programming languages.
In the next segment, Joydeep will explain ‘Industrial Use Case‘.
Additional Reading
Whitepaper on Hive By Facebook