IKH

Database Creation

Like SQL, in Hive, we can do DDL and DML operations.  So, let us get acquainted with the data set first. Here, you will have a movies_rating data set. In this case study, each user has rated movies, and there are a total of five attributes in the data set: 

  • User_id: The unique id of a particular user.
  • Age: The age of a user who has rated a movie.
  • Gender: The gender of a user.
  • Profession: The users’ profession, i.e., whether they are doctors, artists, engineers, etc.
  • Ratings: The total rating given by a particular user to various movies

Each field in the data set is separated by a ‘|‘ from the other, and each line is separated by ‘\n’. 

Let’s start with database creation:

You have to copy Movies_Rating_Data data to S3 bucket created by you.

First, let’s download the dataset on the EMR instance.

Example

Python
# Switch to the root user
sudo -i 


# Create a directory named movie_ratings  
mkdir /root/movie_ratings


# Download the dataset by running the following command   
wget ml-cloud-dataset.s3.amazonaws.com/u.user

Output

Now you have to copy movie rating dataset to your S3 bucket. The steps are as follows

  • Create an S3 bucket movie-ratings-bucket.
  • Create folder tables inside movie-ratings-bucket.
  • . Now in EMR instance (at Putty) execute the following command to copy files to your S3 bucket.

Example

Python
aws s3 cp /root/movie_ratings/u.user  s3://movies-ratings-bucket/tables/

Output

The following document includes the steps to open the Hive console on the EMR instance.

Note:

You may face write access issues while running Hive. To resolve these issues, you need to run the commands at the end of this document once before starting with Hive.

You are expected to code along with the demonstration itself, as this will help you understand the concepts better and you will also get hands-on with querying in Hive.

Now, let us learn about the very first aspect of database creation in the next video.

Let us summarise the above video:

  • You learnt that you can create a database with the help of the following query:

Example

Python
create database if not exists demo ;

Output

So, this database named ‘demo’ is created in the Hive warehouse itself. 

  • To print the description of the database, you can write a query with the keyword “extended”:??

Example

Python
describe database extended demo2 ;

Output

  • If you want to see the database’s headers, you will need to set the headers to true:

Example

Python
set hive.cli.print.header=true ;

Output

Report an error