Now you know about the evolution and driving factor which led to the development of Apache Hive. You also learnt some use cases which are solved conveniently using Hive.
Now Joydeep talks about how clients of Qubole use Hive as part of their Big Data arsenal. Qubole is a cloud-based data processing platform which provides big data processing as a service to its end users.
In this video, you learnt how large companies like Ola and Pinterest are using Hive to get valuable insights from the massive amount of data they collect from their customers. These companies use Hive for analytical processing of huge batches of data such as data over a period of a month, year etc. Analytical processing includes generation of summary reports, analysing historical data for finding trends and patterns, etc.
User interfaces such as Qubole and Hue on top of Hive makes querying quite convenient. You can also present your query results in the form of presentable charts such as bar graphs, pie charts, etc. Hence, Qubole and Hue are preferred for running ad-hoc interactive queries on Hive.
Hive is not suited for transactional processing on data because:
- Although Hive does support “insert”, “delete” and “update” operations on the dataset, there are multiple restrictions if you want to use them. Firstly, you need to turn ACID properties for a table On if you want to perform ACID operations on it and once turned On for a table, it cannot be turned back. Also, many other limitations exist related to table properties and formats. Insert operations were not supported at launch and are supported in versions newer than 0.14 and are under development even now. You can read more about ACID transactions in Hive using this link. You can also go through this link to see the status of ACID transactions currently and the future work that is proposed.
- All hive queries are internally translated into MapReduce programs. As MapReduce is a disk-based data processing framework which is well suited for batch processing, Hive is also considered as a technology suited for batch processing of data. Hive is nothing but an SQL layer added on top of MapReduce to hide the complexities of Map Reduce from the end-user.
- The records present in Hive tables are not indexed. Hence, accessing each record individually takes a lot of time.
Additional Reading
- To learn more about the pros and cons of Hive you can follow the links provided below: Pros and Cons of Hive.