IKH

Practice Questions

Now, in this segment let’s look at some of the practice questions on the Hive queries you learnt in this session.

You are provided with the ‘Airlines’ datasets.

In the previous session, you have created the airlines table. Now, you cancreate a table with partitioning defined for the same dataset, and compare the performance of some heavy queries on both of these tables. 

Note

The dataset and the data dictionary can be downloaded from the ‘Practice Questions’ segment of Session-02.

Let’s try and partition the airline data with the ‘Year’ column. 

Important Note:

Before partitioning any table, make sure you run these commands. 

Example

Python
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;

Output

Create an external table with name “airlines_partitioned“.

Example

Python
create external table if not exists airlines_partitioned (
`SNo` int, `Month` int, `DayofMonth` int, `DayOfWeek` int,
`DepTime` int, `CRSDepTime` int, `ArrTime` int, `CRSArrTime` int,
`UniqueCarrier` string, `FlightNum` int, `TailNum` string,
`ActualElapsedTime` int, `CRSElapsedTime` int, `AirTime` int,
`ArrDelay` int, `DepDelay` int, `Origin` string, `Dest` string,
`Distance` int, `TaxiIn` int, `TaxiOut` int, `Cancelled` int,
`CancellationCode` string, `Diverted` int, `CarrierDelay` int,
`WeatherDelay` int, `NASDelay` int, `SecurityDelay` int, 
`LateAircraftDelay` int)
partitioned by (`Year` int)
location '/root/airlines/partition/';

Output

Insert the data in the table.

Example

Python
insert overwrite table airlines_partitioned partition(`Year`)
select `SNo`, `Month`, `DayofMonth`, `DayOfWeek`, `DepTime`, `CRSDepTime`,
`ArrTime`, `CRSArrTime`, `UniqueCarrier`, `FlightNum`, `TailNum`,
`ActualElapsedTime`, `CRSElapsedTime`, `AirTime`, `ArrDelay`,`DepDelay`,
`Origin`, `Dest`, `Distance`, `TaxiIn`, `TaxiOut`, `Cancelled`, `CancellationCode`,
`Diverted`, `CarrierDelay`, `WeatherDelay`, `NASDelay`, `SecurityDelay`,
`LateAircraftDelay`, `Year` 
from airlines;

Output

Based on this ‘airlines_partitioned‘ table, answer the following questions.

You can perform the same queries on the ‘airlines’ table(created in session-02) and see the performance difference. 

For more practice, we highly recommend you to apply EDA technique to analyse the dataset and share your insights on the DF. 

Report an error