So far, you have got a clear understanding of internal and external tables and learnt how to create and load data into them.
In this segment, we will learn about a very crucial aspect of creating tables and loading data into them. So, let us watch the next video and understand what happens when you create a table with the wrong data type of the attributes in it.
So, in the video above, you saw that the data type of gender is defined as integer. And when you load the data from the HDFS into the table and then print it using the ‘select’ command, you find that the ‘gender’ column contains NULL values. This is because you have defined gender as an integer, but its data type in the HDFS is not integer but string.
This is one of the best examples of the “schema on read” feature of Hive. You can define any schema while creating tables, but if you do not load the right values into the tables, then you will get such results, as in this case, the table displays NULL values for the gender column. So, you have to be very clever while reading data from the HDFS; you have to define the table in such a way that it complies with all the data type constraints and the order of the columns that have been available in the HDFS.
Now, you will learn about the next operation where you can copy the data of one table into another.
So, in the example above, you saw that there is a table “user_info” and a new table, namely, “secondTable”, has been created, whose attributes are ‘user_id’ as integer and ‘user_profession’ as a string.
Once you have created the table, you can use the “insert” command to insert data into the newly created table from the already existing table as shown in the query above.
Now, let us proceed to the next part where you will learn how to alter the position of the columns of an existing table.
So, as you saw in the video above, the position of the gender column was changed from the second position to the last position using the “alter” command.
With this, you have learnt how to create and to manipulate tables and have a clear understanding of internal and external tables now. In the next segment, you will learn how to sort data.