CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples-AI-php.cn

After entering the pre-training era, the performance of visual recognition models has developed rapidly, but image generation models, such as generative adversarial networks (GAN), seem to have fallen behind.

Usually GAN training is done from scratch in an unsupervised manner, which is time-consuming and labor-intensive. The "knowledge" learned through big data in large-scale pre-training is not used. Isn't it a big loss?

Moreover, the image generation itself needs to be able to capture and simulate complex statistical data in real-world visual phenomena. Otherwise, the generated images will not conform to the laws of the physical world and will be directly identified as "fake" at a glance.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

The pre-trained model provides knowledge and the GAN model provides generation capabilities. The combination of the two is probably a beautiful thing!

The question is, which pre-trained models and how to combine them can improve the generation ability of the GAN model?

Recently, researchers from CMU and Adobe published an article in CVPR 2022, combining the pre-training model with the training of the GAN model through "selection".

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

Paper link: https://arxiv.org/abs/2112.09130

Project link: https://github.com/nupurkmr9/vision- aided-gan

Video link: https://www.youtube.com/watch?v=oHdyJNdQ9E4

The training process of the GAN model consists of a discriminator and a generator, where the discriminator The generator is used to learn the relevant statistics that distinguish real samples from generated samples, while the goal of the generator is to make the generated images as consistent as possible with the real distribution.

Ideally, the discriminator should be able to measure the distribution gap between the generated image and the real image.

But when the amount of data is very limited, directly using a large-scale pre-trained model as the discriminator can easily lead to the generator being "ruthlessly crushed" and then "overfitting".

Through experiments on the FFHQ 1k data set, even if the latest differentiable data enhancement method is used, the discriminator will still be overfitted. The training set performance is very strong, but it performs very poorly on the verification set. Difference.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

# Additionally, the discriminator may focus on disguises that are indiscernible to humans but obvious to machines.

In order to balance the capabilities of the discriminator and generator, researchers proposed to assemble the representations of a different set of pre-trained models as the discriminator.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

This method has two advantages:

1. Training a shallow classifier on pre-trained features allows the deep network to adapt to small scale A common method for data sets while reducing overfitting.

That is to say, as long as the parameters of the pre-trained model are fixed, and then a lightweight classification network is added to the top layer, a stable training process can be provided.

For example, from the Ours curve in the above experiment, you can see that the accuracy of the verification set is much improved compared to StyleGAN2-ADA.

2. Some recent studies have proven that deep networks can capture meaningful visual concepts, from low-level visual cues (edges and textures) to high-level concepts (objects and object parts). .

The discriminator based on these features may be more in line with human perception.

And combining multiple pre-trained models can promote the generator to match the real distribution in different, complementary feature spaces.

In order to select the best pre-training network, the researchers first collected multiple sota models to form a "model bank", including VGG-16 for classification and Swin-T for detection and segmentation. wait.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

Then based on the linear segmentation of real and fake images in the feature space, an automatic model search strategy is proposed, and label smoothing and differentiable enhancement techniques are used to further stabilize the model training to reduce overfitting.

Specifically, the union of real training samples and generated images is divided into a training set and a verification set.

For each pre-trained model, train a logical linear discriminator to classify whether the sample is from a real sample or a generated one, and use "negative binary cross-entropy loss" on the validation split to measure the distribution gap, and Return the model with the smallest error.

A lower validation error is associated with higher linear detection accuracy, indicating that these features are useful for distinguishing real samples from generated samples, and using these features can provide more useful feedback to the generator.

Researchers We empirically verified GAN training using 1000 training samples from the FFHQ and LSUN CAT data sets.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples The results show that GAN trained with the pre-trained model has higher linear detection accuracy and, generally speaking, can achieve better FID indicators.

In order to incorporate feedback from multiple ready-made models, the article also explores two model selection and integration strategies

1) K-fixed model selection strategy, selecting the K best ones at the beginning of training Ready-made models and train until convergence;

2) K-progressive model selection strategy, iteratively select and add the best performing and unused model after a fixed number of iterations.

The experimental results show that compared with the K-fixed strategy, the progressive approach has lower computational complexity and is also helpful in selecting pre-trained models to capture differences in data distribution. For example, the first two models selected by the progressive strategy are usually a pair of self-supervised and supervised models.

The experiments in the article are mainly progressive.

The final training algorithm first trains a GAN with a standard adversarial loss.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples Given a baseline generator, linear probing can be used to search for the best pre-trained model and introduce a loss objective function during training.

In the K-progressive strategy, after training for a fixed number of iterations proportional to the number of available real training samples, a new visually auxiliary discriminator is added to the previous stage with the best training set In the snapshot of FID.

During the training process, data augmentation is performed by horizontal flipping, and differentiable augmentation techniques and one-sided label smoothing are used as regularization terms.

It can also be observed that using only off-the-shelf models as discriminators leads to divergence, while the combination of original discriminators and pre-trained models can improve this situation.

The final experiment shows the results when the training samples of the FFHQ, LSUN CAT and LSUN CHURCH data sets vary from 1k to 10k.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples In all settings, FID can achieve significant improvements, proving the effectiveness of this method in limited data scenarios.

In order to qualitatively analyze the differences between this method and StyleGAN2-ADA, according to the quality of samples generated by the two methods, the new method proposed in the article can improve the quality of the worst samples, especially for FFHQ and LSUN CAT

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples When we gradually add the next discriminator, we can see that the accuracy of linear detection on the features of the pre-trained model is gradually declining, that is to say Generators are stronger.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples Overall, with only 10,000 training samples, this method performs better on FID on LSUN CAT than training on 1.6 million images The performance of StyleGAN2 is similar.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples On the complete data set, this method improves FID by 1.5 to 2 times on the LSUN cat, church, and horse categories.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

The author Richard Zhang received his PhD from the University of California, Berkeley, and his undergraduate and master's degrees from Cornell University. Main research interests include computer vision, machine learning, deep learning, graphics and image processing, often working with academic researchers through internships or university.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples The author Jun-Yan Zhu is an assistant professor in the School of Robotics in the School of Computer Science at Carnegie Mellon University, and serves in the Department of Computer Science and the Machine Learning Department. ,The main research areas include computer vision, computer graphics, machine learning and computational photography.

Before joining CMU, he was a research scientist at Adobe Research. He graduated from Tsinghua University with a bachelor's degree and a Ph.D. from the University of California, Berkeley, and then worked as a postdoctoral fellow at MIT CSAIL.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples #

The above is the detailed content of CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples. For more information, please follow other related articles on the PHP Chinese website!