IKH

Interpreting a Decision Tree

Before even jumping to the concepts of decision trees and constructing one on your own in Python, it is important that you first understand how a decision tree is interpreted for you to have a better appreciation of the model building process. 

Like we have mentioned earlier as well, a decision tree is nothing but a tree asking a series of questions to arrive at a prediction. The problem at hand is to predict whether a person has heart disease or not. Based on the values that various attributes such as gender, age, cholesterol, the decision trees try to make a prediction and output a flowchart-like diagram. Let’s hear Rahim describe this output and how to interpret it.

Please find below the data set used for the construction of decision trees around heart disease.

As you saw, interpreting a decision tree is nothing but asking a series of questions – something similar to what a doctor would do when they are diagnosing their patients. The first question we asked was, “Is the age less than 54.5?”. Depending on the answer, we moved to the next step where we asked whether the person is a male or a female, and so on. At the end of this line of question, lies an answer – whether the person has heart disease or not. 

Now, if you were a doctor, you could ask these series of questions, and depending on the answers, you can probably make an educated prediction of whether the patient has the disease or not. This prediction here will obviously be based on your past learnings and experiences of treating such patients. A decision tree does the same thing – on the training data, it checks how the patients are doing based on their different attributes (this acts as the algorithm’s experience) and based on that experience, it asks the users a series of questions to predict whether a person has heart disease or not.

Let’s now look at the other side of the tree and understand the factors leading to a heart disease.

As you saw in this video, it is easy to interpret a decision tree, and you can almost always identify the various factors that lead to a particular decision. In fact, trees are often underestimated in their ability to relate the predictor variables to their predictions. As a rule of thumb, if interpretability by layman is what you are looking for in a model, then decision trees should be at the top of your list.

In a decision tree, you start from the top (root node) and traverse left/right according to the result of the condition. Each new condition adds to the previous condition with a logical ‘and’, and you may continue to traverse further until you reach the final condition’s leaf node. A decision is the value (class/quantity) that is assigned to the leaf node.

As depicted in the heart disease example in the image above, the leaf nodes (bottom) are labelled ‘Disease’ (indicating that the person has heart disease) or ‘No Disease’ (which means the person does not have heart disease).

Note that the splits are effectively partitioning the data into different groups with similar chances of heart disease.

So, in decision trees, you can traverse the attributes backwards and identify the factors that lead to a particular decision. 

In the heart disease example, the decision tree predicts that if the ‘age’ of a person is less than or equal to 54.5, the person is female, and her cholesterol level is less than or equal to 300, then the person will not have heart disease, i.e., young females with a cholesterol level <= 300 have a low chance of being diagnosed with heart disease.

Similarly, there are other paths that lead to a leaf being labelled Disease/No Disease. In other words, each decision is reached via a path that can be expressed as a series of ‘if’ and logical ‘and’ conditions that are satisfied together. The final decisions are stored in the form of class labels in leaves.

Comprehension – Interpreting a Decision Tree

Mithali is an incredible cricketer. She plays cricket only when the pitch is dry, and the field is well lit. Based on past observations, you also know that she doesn’t play cricket when either the pitch is wet, or the light is dim.

Now that you have undetstood how to interpret the end result of a decision tree, let’s now learn how to actually construct this tree in the first place.

Report an error