IKH

External Table Creation

In the previous segment, you have loaded the data into your S3 bucket and created a database in Hive via the Hue interface. Now, in this segment let’s go ahead and create the Hive tables for our analysis.

Let’s summarise the commands that you saw in the above video.

  • Create an external table with name amz_review_dataset and load data into it

Example

Python
create external table amz_review.amz_review_dump (json_dump string)  
location 's3a://<your_bucket_name>/tables/';

Output

Verify whether data is loaded into the table.

Example

Python
select * from amz_review.amz_review_dump limit 10;

Output

  • To add the data into different columns, run:

Example

Python
create external table amz_review.amz_review_col (
    reviewerid string,
    asin string,
    reviewername string,
    helpful array<int>,
    reviewtext string,
    overall double,
    summary string,
    unixreviewtime bigint)
    row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    with serdeproperties ('paths'= '')
    location 's3a://<your bucket name> /tables/';

Output

  • Verify data on the table.

Example

Python
Select * from amz_review.amz_review_col limit 5;

Output

In the next segment, we will start with a generic analysis of the ‘amazon review’ dataset.

Report an error