Self-training is a semi-supervised classification method that includes smoothness and clustering assumptions. Therefore, it is also called self-labeling or decision-oriented learning.
Generally, self-training is a good choice when the labeled dataset contains a lot of information about the data generation process, and the unlabeled samples are only used to fine-tune the algorithm.
However, when these conditions are not met, the results of self-training are not ideal. Therefore self-training depends heavily on labeled samples.
Each step of self-training labels unlabeled data according to the current decision function and retrains using predictions.
Self-train the algorithm to fit pseudo-labels predicted by another previously learned supervised model.
Data instances are divided into training sets and test sets, and the classification algorithm is trained on the labeled training data. Evaluate data points and use confidence vectors to represent predictions.
2. Select the top K values associated with the maximum confidence and add them to the labeled dataset.
3. The classifier predicts the class labels of the labeled test data instances and evaluates the classifier performance using the selected metrics.
4. The classifier is retrained using the new labeled data set.
Self-training exploits the structure of labeled datasets to discover suitable separation hypersurfaces. After this process, unlabeled samples are evaluated and classified points with sufficient confidence are included in the new training set. The self-training algorithm repeats this process until every data point is classified.
The above is the detailed content of The concept of self-training and its connection to semi-supervised learning. For more information, please follow other related articles on the PHP Chinese website!