RANDOM FORESTS AND ENSEMBLE CLASSIFIERS: THE WISDOM OF THE CROWD
In practice, a much more powerful way to use decision trees is as part of an ensemble classifier. The idea of ensemble classifiers is that rather than training a single very complicated (and probably overfit) model, we train a (possibly large) number of simpler classifiers. These simple classifiers are then combined (in a simple way) to produce the final prediction. Ensemble classifiers are typically the top performing classifiers in contemporary machine learning competitions because they can leverage the complementary advantages of many different approaches.
Random forests are a type of ensemble classifier based on decision trees. The idea is to train a large number (e.g., 100) of decision trees (the “forest”) based on random subsets of the features. Because the tree training algorithm (such as CART) doesn’t see all of the dimensions, each tree remains relatively simple (much fewer decision levels are included then if all the dimensions are included). This means that the individual trees are much less prone to overfitting. This strategy is effective at combating overfitting, but each one of the trees achieves much worse classification performance because it is not using all of the dimensions of the data.
This is where the magic of the ensemble classifier comes in: To classify new data, the random forest gives all of the trees a chance to predict the class of the new observation. These predictions are then summarized (usually simply by assigning the new data to the majority class among the predictions). Because all of the trees were trained on random subsets of the features, they are all looking at different parts of the new data, and therefore their predictions are somewhat independent. Allowing all of the classifiers to vote turns out to be a very effective classification strategy. Amazingly, random forests are thought to be so robust to overfitting that the user can increase the number of trees arbitrarily—only computational constraints limit the number of trees that are used in practice.
More generally, as long ensemble classifiers are combining the predictions from simpler classifiers that are somewhat independent, it is thought that including additional classifiers will always improve the accuracy of the final classifier.