IKH

Load Amazon Review Data Set

In previous sessions, you have learnt about Hive data models, internal architecture and also did hands-on practice with HQL. In this session, you will apply the HQL queries to analyse Amazon product review dataset.

Let’s start with loading Amazon dataset to an S3 bucket.

Let’s summarise the commands that you have seen in the video above. 

  • Load Amazon reviews data into Bucket using EC2 instance using below command.

Example

Python
aws s3 cp s3://hivedata-bde/Electronics_5.json s3://yourbucketname/yourfoldername/

Output

Verify whether the data set has been copied to your bucket.

Example

Python
aws s3 ls s3://yourbucketname/yourfoldername/

Output

  • Create database amz_review and before that check whether the database is present or not.
  • Run this command to add SerDe jar.

Example

Python
Add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-2.3.6-amzn-2.jar;

Output

  • For further analysis, you have to create a separate database:

Example

Python
create database amz_review;
Show databases;

Output

In the next segment, you will see the steps to create the External table for data analysis

Report an error