So, creating dummy variables is one way of transforming variables. Let’s now move on to another technique commonly used for transforming variables — **Weight of evidence (WOE) analysis**.

So, to summarise, you learnt **three important** things in this lecture:

- Calculating
**woe****values**for fine binning and coarse binning - The
**importance**of woe for fine binning and coarse binning - The
**usage**of woe transformation

**WOE** can be calculated using the following equation:

WOE = ln(good in the bucketTotal Good )−ln(bad in the bucketTotal bad )

Or, it can be expressed as:

WOE=ln(PercentageofGoodPercentageofBad)

Once you’ve calculated woe values, it is also important to note that they should follow an **increasing or decreasing trend** across bins. If the trend is not **monotonic**, then you would need to compress the buckets/ bins (coarse buckets) of that variable and then calculate the WOE values again.

As mentioned in the lecture, there are two main advantages of WOE.

- WOE reflects group identity: This means it captures the general trend of distribution of good and bad customers. E.g. the difference between customers with 30% credit card utilisation and 45% credit card utilisation is not the same as the difference between customers with 45% credit card utilisation and customers with 60% credit card utilisation. This is captured by transforming the variable credit card utilisation using WOE.
- WOE helps you in treating missing values logically for both types of variables — categorical and continuous. E.g. in the credit card case, if you replace the continuous variable credit card utilisation with WOE values, you would replace all categories mentioned above (0%-45%, 45% – 60%, etc.) with certain specific values, and that would include the category “missing” as well, which would also be replaced with a WOE value.

Let’s also understand the positives and negatives of woe transformation from Hindol.

So, basically, the pros and cons of a WOE transformation are similar to dummy variables.

- Pros: The model becomes more stable because small changes in the continuous variables will not impact the input so much.
- Cons: You may end up doing some score clumping.

This is because when you are using WOE values in your model, you are doing something similar to creating dummy variables — you are replacing a range of values with an indicative variable. It is just that, instead of replacing it with a simple 1 or 0, which was not thought out at all, you are replacing it with a well thought out WOE value. Hence, the chances of undesired score clumping will be a lot less here.

Let’s now move on to IV (Information Value), which is a very important concept.

So, **information value** can be calculated using the following expression:

IV=WOE∗ (Good in the bucketTotal Good−Bad in the BucketTotal Bad)

Or it can be expressed as:

IV=WOE ∗(Percentageofgoodinthebucket−Percentageofbadinthebucket)

It is an important indicator of **predictive power**.

Mainly, it helps you understand how the binning of variables should be done. The binning should be done such that the WOE trend across bins is monotonic — either increasing all the time or decreasing all the time. But one more thing that needs to be taken care of is that IV (information value) should be high.

**Comprehension 1: WOE and Information Value Analysis**

You are required to **download the data set** from below for answering the questions that follow:

In the attached file, there are three sheets. The first sheet contains three variables (Tenure, Second Contract and Churn) from the telecom data. The second sheet contains the distribution of the binned tenure variable. The third sheet contains the distribution of goods and bad information of the contract variable.