Classification Tree

What?[edit | edit source]

A classification tree, like its name implies, has a root and a series of branches, eventually terminating at leaf nodes. Unlike regular trees, however, they tend to grow upside down.

Decision tree for Iris data set.
Decision tree for Iris data set.

At the root are all observations in a data set and some decision boundary that separates the observations into two groups.

Decision boundary for petal width at 0.8 units.
Decision boundary for petal width at 0.8 units.

The two groups are again separated into two subsets each, along another dividing point. This process repeats a given number of times, called depth.

At the end of the line, we can see the result of the series of decisions. Every observation that enters at the top of the decision tree will pass through a number of decision nodes and end up in one of the leaf nodes.

The nodes also tell us how many observations from the training data set would be accurately classified with the given decision boundaries.

Decision tree leaf nodes showing accuracy of classification
Decision tree leaf nodes showing accuracy of classification

To improve accuracy, we can increase the depth of the decision tree, at the risk of overfitting the training data set.

Why?[edit | edit source]

Classification trees provide us humans with an intuitive glimpse into the classification model. They can be used manually to guide decisions or to gain understanding of factors that distinguish observations. Classification trees are one of the easier types of machine learning algorithms for humans to understand.

How?[edit | edit source]

Try it.[edit | edit source]

 PreviousNext