IKH

Practice Question

This segment contains some practice questions around Hive queries.

You are given an ‘Airlines’ data set. This data contains information about the flights that have arrived and departed from different airports.

Following is the file containing information about the attributes. You can download it and read the complete explanation of the attributes.

Get data by executing below command on EMR instance.

Example

Python
# Switch to the root user
sudo -i

# Create a new directory named 'airlines'
mkdir airlines

# move inside the directory
cd airlines

# Download the dataset
wget ml-cloud-dataset.s3.amazonaws.com/Airlines_data.txt

Output

Following is the data dictionary for the dataset.

Index NameDescription
0IDSerial Number
1Year2004-2008
2Month1-12
3DayofMonth1-31
4DayOfWeek1 (Monday) – 7
(Sunday)
5DepTimeActual departure time (local,hhmm)
6CRSDep TimeScheduled departure time (local,hhmm)
7ArrTimeActual arrival time (local,hhmm)
8CRSArrTimeScheduled arrival time (local,hhmm)
9UniqueCarrierUnique carrier code
10FlightNumFlight number
11TailNumPlane tail number
12ActualElapsedTimeTime in minutes
13CRSElapsedTimeTime in minutes
14AirTimeTime in minutes
15ArrDelayArrival delay. in minutes
16DepDelay Departure delay, in minutes
17OriginOrigin IATA airport code (International Air Transport Association)
18DestDestination IATA airport code
19DistanceIn miles
20TaxiInTime taken from runway to the terminal after landing , excluding deceleration , in minutes
21TaxiOutTaxi taken from terminal to the runways before take-off, excluding acceleration, in minutes
22CancelledWhether the flight was cancelled
23CancellationCodeReason for cancellation ( A = carrier , B = weather , C = NAS , D= security)
24Diverted1 = yes , 0 = no
25CarrierDelayTime in minutes
26WeatherDelayTime in minutes
27NASDelayTime in minutes
28SecurityDelayTime in minutes
29LateAircraftDelayTime in minutes

Note

Before creating any table, make sure you run this command.

Example

Python
Add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-2.3.6-amzn-2.jar;

Output

Create an external table

Example

Python
CREATE EXTERNAL TABLE airlines(
`SNo` int, `Year` int, `Month` int, `DayofMonth` int, `DayOfWeek` int, `DepTime` int, `CRSDepTime` int,
 `ArrTime` int, `CRSArrTime` int, `UniqueCarrier` string, `FlightNum` int, `TailNum` string, `ActualElapsedTime` int, 
`CRSElapsedTime` int, `AirTime` int, `ArrDelay` int, `DepDelay` int, `Origin` string, `Dest` string, `Distance` int, `TaxiIn` int, 
`TaxiOut` int, `Cancelled` int, `CancellationCode` string, `Diverted` int,`CarrierDelay` int, `WeatherDelay` int,
 `NASDelay` int,`SecurityDelay` int, `LateAircraftDelay` int )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;

Output

Insert data into table airlines

Example

Python
load data local inpath '/root/airlines/Airlines_data.txt' overwrite into table airlines;

Output

Based on this ‘Airlines’ data set, answer the following questions.

For more practice, we highly recommend you to apply EDA technique to analyse the dataset and share your insights on the DF.

Report an error