In previous sessions, you have learnt about Hive data models, internal architecture and also did hands-on practice with HQL. In this session, you will apply the HQL queries to analyse Amazon product review dataset.
Let’s start with loading Amazon dataset to an S3 bucket.
Let’s summarise the commands that you have seen in the video above.
- Load Amazon reviews data into Bucket using EC2 instance using below command.
Example
Python
aws s3 cp s3://hivedata-bde/Electronics_5.json s3://yourbucketname/yourfoldername/Output
Verify whether the data set has been copied to your bucket.
Example
Python
aws s3 ls s3://yourbucketname/yourfoldername/Output
- Create database amz_review and before that check whether the database is present or not.
- Run this command to add SerDe jar.
Example
Python
Add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-2.3.6-amzn-2.jar;Output
- For further analysis, you have to create a separate database:
Example
Python
create database amz_review;
Show databases;Output
In the next segment, you will see the steps to create the External table for data analysis
Report an error