IKH

Target Dimension Model

Now, in this segment, you will get to know about the target schema(dimension model) according to which you will have to transform the data set using Spark as well as create and load it accordingly in RedShift.  Simply put this will be the schema according to which you’ll create the tables in Redshift.

In the next video, our expert will walk you through the target dimension model.

If you want to read about the dimensions and facts please refer to the Additional Resources.

As discussed in the video, you will be following the provided target dimension model schema for this ETL project.

For this project, you will need four dimension tables and one fact table. They are as follows.

  • ATM dimension – This dimension will have the data related to the various ATMs present in the dataset along with the ATM number(ATM ID in the original dataset), ATM manufacturer and a reference to the ATM location and is very important for solving analytical queries related where ATM data will be used.
  • Location dimension – This is a very important dimension containing all the location data including location name, street name, street number, zip code and even the latitude and longitude. This information will be very important for solving problems related to the particular location at which a transaction took place and can help banks in things like pinpointing locations where ATMs where demand is higher as compared to other locations. Combined with weather data in the transaction table, this can be used to further do analysis such as how weather affects the demand at ATMs at a particular location.
  • Date dimension – This is another very important dimension which is almost alway present where data such as transactional data is being dealt with. This dimension includes fields such as the full date and time timestamp, year, month, day, hour as well as the weekday for a transaction. This all can help in analysing the transaction behaviour with respect to the time at which the transaction took place and also how the transaction activity varies between weekdays and weekends.
  • Card type dimension – This dimension has the information about the particular card type with which a particular transactions varies with respect to each different card type.
  • Transaction fact – This is the actual fact table for the data set which contains all of the numerical data such as the currency of the transaction, service, transaction amount, message code and text as well as weather info such as description, weather id etc.

The schemas for these tables are provided in the following document.

The analytical queries defined in the next segment will also be based on the Redshift tables.

In the next segment, you will look at the solution approach as well as the tasks and analytical queries, which you will solve in this ETL project.

Additional Resources

  • If you want to get clarity on the concepts of Dimensions & Fact Tables, you can refer to the relevant segments in the Data Warehousing and ETL module. Anyways the Dimensional Model(target schema) to be used in this provided to you.
  • Dimension – This is a Wikipedia page describing the concept of Dimension in a Data Warehouse.
  • Fact Tables –  This is a Wikipedia page describing the concept of Fact Tables in a Data Warehouse.

Report an error