In this segment, you will learn how to set up an Amazon EMR instance for Apache Hive and Hue.
Please note that you will learn the EMR configuration needed for Hive in this segment.
Still, other steps related to setting up an Amazon EMR instance, such as login steps and YARN parameters steps, will remain the same as in the Introduction to Cloud and AWS Setup module. Make sure that you are following those steps as well.
Now, in this video, you will learn how to set up an EMR instance for Apache Hive.
As mentioned in the video, most of the steps are similar to previous EMR setups. The only significant changes are in the software packages chosen during the initial setup of the instance.
The following document contains the steps to set up an EMR instance for Apache Hive:
After setting up the EMR instance, you will be able to log in to the Hue interface as well. The following video will show you how you can access Hue on your AWS EMR cluster.
The steps to access Hue have also been documented in the following document.
Now, for this module, you will also have to work with the AWS S3 service.
Amazon Simple Storage Service or Amazon S3 as it is popularly known is a web storage service provided by Amazon. It is widely used by developers across the industry to store and retrieve data. During this module, we will be making use of S3 frequently to read and write data using Hive queries.
Data on S3 is stored in buckets, with each bucket having a unique name that is used to identify it across the Amazon AWS ecosystem. You can read instructions on creating a bucket in the document below.
In the next segment, you will see how you can setup a database on your EMR instance.