Home >Common Problem >What is the principle of random forest algorithm
The principle of the random forest algorithm is: 1. The random forest algorithm is an improvement on the Bagging algorithm; 2. The random forest uses the CART decision tree as a weak learner; 3. The input is a sample set, and the weak classifier iterates Times T; 4. The output is the final strong classifier [f(x)].
The principle of random forest algorithm is:
1. Random forest algorithm
RF (Random Forest) algorithm is an improvement on the Bagging algorithm.
First of all, RF uses the CART decision tree as a weak learner, which reminds us of the gradient boosting tree GBDT.
Second, based on the use of decision trees, RF has improved the establishment of decision trees. For ordinary decision trees, we will select an optimal one among all n sample features on the node. Features are used to divide the left and right subtrees of the decision tree, but RF randomly selects a part of the sample features on the node. This number is less than n, assuming it is nsub, and then among these randomly selected nsub (less than n) sample features, select An optimal feature is used to divide the left and right subtrees of the decision tree. This further enhances the generalization ability of the model.
Except for the above two points, RF is no different from the ordinary bagging algorithm. The following is a brief summary of the RF algorithm.
The input is the sample set, and the number of iterations of the weak classifier is T.
The output is the final strong classifier f(x)
(1) For t = 1,2,3,... ,T;
The training set is sampled for the tth time, and a total of m times are collected to obtain a sampling set Dt containing m samples.
Use the sampling set Dt to train the tth decision tree model Gt (x), when training the node of the decision tree model, select a part of the sample features from all the sample features on the node, and select an optimal feature from these randomly selected part of the sample features to make the left and right subtrees of the decision tree divide.
(2) If it is predicted by a classification algorithm, the category or one of the categories that the T weak learners cast the most votes will be the final category. If it is a regression algorithm, the arithmetic average of the regression results obtained by T weak learners is the final model output.
2. Promotion of Random Forest
RF is not only used for classification problems, but also for feature conversion, outlier detection, etc.
2.1 extra trees
Extra trees are a variant of RF. The principle is almost exactly the same as RF. The only difference is:
(1) For the training of each decision tree, RF uses random sampling bootstrap to select the sampling set as the training set for each decision tree, while extra trees generally do not use random sampling, that is, the original training set used by each decision tree.
(2) After selecting the dividing features, the RF decision tree will select an optimal feature dividing point based on principles such as Gini coefficient and mean square error, which is the same as the traditional decision tree. But extra trees are more radical. They randomly select a feature value to divide the decision tree.
2.2 Totally Random Trees Embedding
Totally Random Trees Embedding (hereinafter referred to as TRTE) is an unsupervised learning data transformation method. It maps low-dimensional data sets to high dimensions, so that data mapped to high dimensions can be better used in classification and regression models. We know that the kernel method is used in support vector machines to map low-dimensional data sets to high dimensions. Here TRTE provides another method.
TRTE also uses a method similar to RF in the data transformation process to establish T decision trees to fit the data. After the decision tree is established, the position of the leaf nodes in the T decision trees for each data in the data set is also determined. For example, we have 3 decision trees, each decision tree has 5 leaf nodes. A certain data feature x is divided into the 2nd leaf node of the first decision tree, the 3rd leaf node of the second decision tree, and the 3rd leaf node of the second decision tree. The fifth leaf node of three decision trees. Then the feature code after x mapping is (0,1,0,0,0, 0,0,1,0,0, 0,0,0,0,1), which has 15-dimensional high-dimensional features. Spaces are added between feature dimensions here to emphasize the sub-coding of each of the three decision trees.
After mapping to high-dimensional features, you can continue to use various classification and regression algorithms of supervised learning.
3. Summary of Random Forest
The algorithm principle of RF has finally been explained. As an algorithm that can be highly parallelized, RF is widely used There's a lot to be done with data.
The main advantages of RF are:
1) Training can be highly parallelized, which has advantages in large sample training speed in the big data era. Personally I think this is the main advantage.
2) Since the decision tree nodes can be randomly selected to divide features, the model can still be trained efficiently when the sample feature dimension is very high.
3) After training, the importance of each feature to the output can be given
4) Due to the use of random sampling, the trained model has small variance and strong generalization ability.
5) Compared with Adaboost and GBDT of the Boosting series, RF implementation is relatively simple.
6) Insensitive to missing features.
The main disadvantages of RF are:
1) On certain sample sets with relatively large noise, the RF model is prone to overfitting.
2) Features with more value divisions are likely to have a greater impact on RF decision-making, thereby affecting the effect of the fitted model.
The above is the detailed content of What is the principle of random forest algorithm. For more information, please follow other related articles on the PHP Chinese website!