Like SQL, in Hive, we can do DDL and DML operations. So, let us get acquainted with the data set first. Here, you will have a movies_rating data set. In this case study, each user has rated movies, and there are a total of five attributes in the data set:
- User_id: The unique id of a particular user.
- Age: The age of a user who has rated a movie.
- Gender: The gender of a user.
- Profession: The users’ profession, i.e., whether they are doctors, artists, engineers, etc.
- Ratings: The total rating given by a particular user to various movies
Each field in the data set is separated by a ‘|‘ from the other, and each line is separated by ‘\n’.
Let’s start with database creation:
You have to copy Movies_Rating_Data data to S3 bucket created by you.
First, let’s download the dataset on the EMR instance.
Example
# Switch to the root user
sudo -i
# Create a directory named movie_ratings
mkdir /root/movie_ratings
# Download the dataset by running the following command
wget ml-cloud-dataset.s3.amazonaws.com/u.userOutput
Now you have to copy movie rating dataset to your S3 bucket. The steps are as follows
- Create an S3 bucket movie-ratings-bucket.
- Create folder tables inside movie-ratings-bucket.
- . Now in EMR instance (at Putty) execute the following command to copy files to your S3 bucket.
Example
aws s3 cp /root/movie_ratings/u.user s3://movies-ratings-bucket/tables/Output
The following document includes the steps to open the Hive console on the EMR instance.
Note:
You may face write access issues while running Hive. To resolve these issues, you need to run the commands at the end of this document once before starting with Hive.
You are expected to code along with the demonstration itself, as this will help you understand the concepts better and you will also get hands-on with querying in Hive.
Now, let us learn about the very first aspect of database creation in the next video.
Let us summarise the above video:
- You learnt that you can create a database with the help of the following query:
Example
create database if not exists demo ;Output
So, this database named ‘demo’ is created in the Hive warehouse itself.
- To print the description of the database, you can write a query with the keyword “extended”:??
Example
describe database extended demo2 ;Output
- If you want to see the database’s headers, you will need to set the headers to true:
Example
set hive.cli.print.header=true ;Output