Zero-shot learning (ZSL) is a machine learning paradigm that utilizes pre-trained deep learning models to generalize to new categories of samples. Its core idea is to transfer the knowledge in existing training instances to the classification task of test instances. Specifically, zero-shot learning techniques predict new data by learning intermediate semantic layers and attributes and then applying this knowledge during inference. This method allows the machine learning model to classify on categories that have not been seen before, achieving the ability to identify unknown categories. Through zero-shot learning, the model can obtain broader generalization capabilities from limited training data, improving its adaptability to new problems in the real world.
It should be noted that the training and test sets are disjoint in zero-shot learning.
Zero-shot learning is a subfield of transfer learning, which is mainly used in situations where the feature and label spaces are completely different. Unlike common isomorphic transfer learning, zero-shot learning is more than just fine-tuning a pre-trained model. It requires learning how to deal with new problems without any samples. The goal of zero-shot learning is to use existing knowledge and experience to transfer this knowledge to new areas in order to solve new problems. This kind of heterogeneous transfer learning is very useful for dealing with situations where there are no or few labels, because it can perform prediction and classification by leveraging the existing label information. Therefore, zero-shot learning has great potential to play an important role in many real-world applications.
Seen Classes: Data classes used to train deep learning models, such as labeled training data.
Unseen Classes: Data classes that existing deep models need to generalize, such as unlabeled training data.
Auxiliary information: Since no labeled instances belonging to unseen classes are available, some auxiliary information is needed to solve the zero-shot learning problem. Such auxiliary information should include all invisible classes of information.
Zero-shot learning also relies on labeled training sets of visible and unseen classes. Both visible and invisible classes are related in a high-dimensional vector space called semantic space, where knowledge from visible classes can be transferred to unseen classes.
Zero-shot learning involves two stages of training and inference:
Training: Acquire knowledge about a set of labeled data samples.
Inference: Extending previously acquired knowledge to use the provided auxiliary information for new sets of classes.
Classifier-based method
Existing classification-based methods The method usually adopts a one-to-many solution to train a multi-class zero-shot classifier. That is, for each unseen class, train a binary one-to-one classifier. We further classify classifier-based methods into three categories according to the method of constructing the classifier.
①Correspondence method
The correspondence method aims to pass a binary one-to-one classifier for each class and its corresponding class prototype The corresponding relationship is used to construct a classifier for invisible classes. Each class has only one corresponding prototype in the semantic space. Therefore, this prototype can be seen as the "representation" of the class. At the same time, in the feature space, for each category, there is a corresponding binary one-to-one classifier, which can also be regarded as the "representation" of the category. Correspondence methods aim to learn the correspondence function between these two "representations".
②Relationship method
The method aims to construct a classifier or invisible class based on the inter-class and intra-class relationships of invisible classes. In the feature space, a binary one-to-one classifier of the classes seen can be learned using the available data. At the same time, the relationship between visible and invisible classes can be obtained by calculating the relationship between corresponding prototypes.
③Combination method
The combination method describes constructing a classification for an invisible class by combining classifiers used to form the basic elements that constitute the class The thought of instrument.
In the composition method, it is considered that there is a list of "basic elements" that make up the class. Each data point in the visible and invisible classes is a combination of these basic elements. Reflected in the semantic space, it is considered that each dimension represents a basic element, and each class prototype represents the combination of these basic elements of the corresponding class.
Each dimension of the class prototype takes 1 or 0, indicating whether the class has corresponding elements. Therefore, this type of method is mainly suitable for semantic space.
Instance-based methods
Instance-based methods aim to first obtain labeled instances of unseen classes and then use these instances to train zero Sample classifier. According to the source of these instances, existing instance-based methods can be divided into three subcategories:
①Projection method
Projection method The idea is to obtain labeled instances of invisible classes by projecting feature space instances and semantic space prototypes into a shared space.
There are labeled training instances in the feature space belonging to the visible class. At the same time, there are prototypes of visible classes and invisible classes in the semantic space. Feature and semantic spaces are real spaces, and instances and prototypes are vectors in them. From this perspective, prototypes can also be viewed as labeled instances. Therefore, we label instances in feature space and semantic space.
②Instance borrowing methods
These methods handle obtaining labeled instances for invisible classes by borrowing from training instances. Instance borrowing methods are based on similarities between classes. With knowledge of these similar classes, instances belonging to unseen classes can be identified.
③Synthesis method
The synthesis method is to obtain marked instances of invisible classes by synthesizing pseudo-instances using different strategies. To synthesize pseudo-instances, it is assumed that the instances of each class follow a certain distribution. First, the distribution parameters of the unseen classes need to be estimated. Then, synthesize instances of the invisible class.
Like other concepts, zero-shot learning has its limitations. Here are some of the most common challenges when applying zero-shot learning in practice.
1. Bias
During the training phase, the model can only access the data and labels of visible classes. This causes the model to predict data samples of unseen classes during testing as visible classes. The bias problem becomes more prominent if during testing the model is evaluated on samples from both visible and unseen classes.
2. Domain transfer
The development of zero-shot learning models is mainly to extend the pre-trained models to new ones as these data gradually become available. kind. Therefore, the domain transfer problem is common in zero-shot learning. Domain shift occurs when the statistical distribution of data in the training set and test set is significantly different.
3. Central Problem
The central problem is related to the curse of dimensionality associated with nearest neighbor search. In zero-shot learning, the central problem occurs for two reasons.
Both input and semantic features exist in high-dimensional space. When such a high-dimensional vector is projected into a low-dimensional space, the variance is reduced, causing the mapped points to be clustered into a center.
Ridge regression, widely used in zero-shot learning, raises central issues. It can lead to biased predictions, i.e., mostly only a few classes are predicted no matter what the query is.
4. Information loss
When training on visible classes, the model only learns the important attributes that distinguish these visible classes. While some potential information may exist in the visible class, it will not be learned if it does not significantly contribute to the decision-making process. However, this information is important during the testing phase of invisible classes. This results in information loss.
The above is the detailed content of Analyze the definition and significance of zero-shot learning (ZSL). For more information, please follow other related articles on the PHP Chinese website!