IKH

CTR Dataset

Now that you have some clarity on the basics of the logistic regression concepts, let’s discuss the data set which would be used in its implementation.

Following is the data set used for this session.

Before going further, remember to upload the dataset in your Amazon s3 bucket.

In the upcoming video, Ajay will take you through the Click-Through Rate (CTR) prediction data set and explain its salient features.

As explained in the video, this data set is about online advertising. The CTR prediction is an important metric used to evaluate performance, i.e., whether the user is clicking the advertisement shown on the website/app or not. CTR can be explained as:

CTR = Clicks/Impressions

Therefore, CTR is basically the rate of the number of users who clicked on the Ad with respect to the number of times the Ad was displayed. Hence, click prediction systems are essential and widely used for sponsored search and real-time bidding.

The features of the current data set are as follows:

  1. id: Ad identifier.
  2. click: Zero for no-click and one for click.
  3. hour: Format is in YYMMDDHH. So, 14091123 means 23:00 on 11 September 2014 Coordinated Universal Time (UTC).
  4. C1: Anonymised categorical variable.
  5. banner_pos: Describes the position of the Ad on the website.
  6. site_id: Unique identifier of different websites.
  7. Site_domain: Domain of given site (For example- News, Sports, Web services, etc.)
  8. Site_category: Category of the website. 
  9. app_id: Unique identifier of different apps. 
  10. app_domain: Domain of the given app.
  11. app_category: Category of the app.
  12. device_id: Unique identifier of the device on which Ad is displayed.
  13. device_ip: IP address of the device.
  14. device_model: Model of the device being used. 
  15. device_type: Type of device.
  16. device_conn_type: Source of the connection for the device.
  17. C14-C21: Anonymised categorical variables.

So, as you are now familiar with the data set, you would want to perform the binary classification and predict whether the user would click on the given Ad or not. So, let’s first start with the Exploratory Data Analysis (EDA) of the data set.

Report an error