CVPR 2023 paper summary! The hottest field of CV is awarded to multi-modal and diffusion models-AI-php.cn

The annual CVPR will officially open in Vancouver, Canada from June 18th to 22nd.

Every year, thousands of CV researchers and engineers from around the world gather for the Summit. This prestigious conference dates back to 1983 and represents the pinnacle of computer vision development.

Currently, CVPR’s h5 index ranks fourth among all conferences or publications, second only to Nature, Science and the New England Journal of Medicine.

CVPR 2023论文总结！CV最热领域颁给多模态、扩散模型

Some time ago, CVPR announced the results of paper acceptance. According to statistics on the official website, a total of 9,155 papers were accepted, 2,359 were accepted, and the acceptance rate was 25.8%.

In addition, 12 award-winning candidate papers were announced.

CVPR 2023论文总结！CV最热领域颁给多模态、扩散模型

So, what are the highlights of this year’s CVPR? What trends can we see in the CV field from the accepted papers?

will be announced next.

CVPR at a glance

The startup Voxel51 analyzed the list of all accepted papers.

Let’s first look at a summary diagram of the title of the paper. The size of each word is proportional to the frequency of occurrence in the data set.

CVPR 2023论文总结！CV最热领域颁给多模态、扩散模型

##Brief description

- 2359 articles Papers accepted (9155 papers submitted)

- 1724 Arxiv papers

- 68 papers submitted to other addresses

Authors per paper

-The average author of a CVPR paper is about 5.4 people

- The paper with the most authors is: "Why is the winner the best?" There are 125 authors

- There are 13 papers with only one author.

Main Arxiv classification

Among the 1,724 Arxiv papers, there are 1,545, or close to 90% The paper lists cs.CV as the main category.

cs.LG ranked second with 101 articles. eess.IV (26) and cs.RO (16) also get a share of the pie.

Other categories for CVPR papers include: cs.HC, cs.CV, cs.AR, cs.DC, cs.NE, cs.SD, cs.CL, cs.IT , cs.CR, cs.AI, cs.MM, cs.GR, eess.SP, eess.AS, math.OC, math.NT, physics.data-an and stat.ML.

「Meta」data

- The two words "dataset" and "model" appear together in Among 567 abstracts. “Dataset” appears alone in 265 paper abstracts, while “model” appears alone 613 times. Only 16.2% of papers accepted by CVPR did not contain these two words.

- According to CVPR paper abstracts, the most popular datasets this year are ImageNet (105), COCO (94), KITTI (55) and CIFAR (36).

- 28 papers propose a new "benchmark".

Acronyms abound

It seems like there is no machine learning project without acronyms. Among the 2,359 papers, 1,487 have titles with multiple abbreviations or compound words in capital letters, accounting for 63%.

Some of these acronyms are easy to remember and even roll off the tongue:

##- CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal PoseCLAMP

- PATS: Patch Area Transportation with Subdivision for Local Feature Matching

- CIRCLE: Capture In Rich Contextual Environments

Some are much more complex:

- SIEDOB: Semantic Image Editing by Disentangling Object and Background

- FJMP : Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction GraphsFJMP

Some of them seem to have borrowed ideas from others on acronym construction:

- SCOTCH and SODA: A Transformer Video Shadow Detection Framework (Dutch popular brand Scotch & Soda)

- EXCALIBUR: Encouraging and Evaluating Embodied Exploration (Ex Curry sticks, lol)

What’s the hottest?

In addition to the 2023 paper titles, we crawled all accepted paper titles in 2022. From these two lists, we calculated the relative frequency of various keywords to give you a deeper understanding of what is an uptrend and what is a downtrend.

Model

In 2023, diffusion models dominate.

CVPR 2023论文总结！CV最热领域颁给多模态、扩散模型

Diffusion Model

With Stable With the popularity of image generation models such as Diffusion and Midjourney, it is not surprising that the development of diffusion models is a hot trend.

Diffusion models also have applications in denoising, image editing, and style transfer. Add it all up, and it's by far the biggest winner across all categories, up 573% year-over-year.

Radiation Field

Neural Radiation Field (NERF) is also becoming more and more popular, and the word " "radiance" increased by 80%, and "NERF" increased by 39%. NeRF has moved from proof of concept to editing, application and training process optimization.

Transformers

The declining usage of "Transformer" and "ViT" does not mean that the Transformer model is outdated. Rather, it reflects the dominance of these models in 2022. In 2021, the word "Transformer" appeared in only 37 papers. In 2022, this number will soar to 201. Transformers aren't going away anytime soon.

CNN

CNN used to be the darling of computer vision. By 2023, it seems that they have lost their advantage. Usage dropped by 68%. Many headlines mentioning CNN also mention other models. For example, these papers mention CNN and Transformer:

- Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth EstimationLite-Mono

- Learned Image Compression with Mixed Transformer-CNN Architectures

Task

The combination of mask task and mask image modeling , occupying a dominant position in CVPR.

CVPR 2023论文总结！CV最热领域颁给多模态、扩散模型

generate

Traditional discriminative tasks such as detection, classification and segmentation have not fallen out of favor, but their share in CV is shrinking due to a series of advances in generative applications, including "editing", "synthesis" and "generation" The rise proves this.

Mask

The keyword "mask" increased by 263% compared with the same period last year and was accepted in 2023 appears 92 times in papers and sometimes 2 times in a title.

- SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance SegmentationSIM

##- DynaMask: Dynamic Mask Selection for Instance SegmentationDynaMask

But the majority (64%) actually refer to the "mask" task, including 8 "mask image modeling" and 15 "mask autoencoder" tasks. In addition, "mask" appears in 8 articles.

It is also worth noting that the 3 paper titles with the word "mask" actually refer to the "no mask" task.

Zero sample vs small sample

With the rise of transfer learning, generative methods, hints and general models, "Zero-shot" learning is gaining traction. At the same time, “small sample” learning has declined from last year. However, in terms of raw numbers, at least for now, the "small sample" (45) has a slight advantage over the "zero sample" (35).

Modality

In 2023, the development of multi-modal and cross-modal applications will accelerate.

CVPR 2023论文总结！CV最热领域颁给多模态、扩散模型

##Blurred boundaries

Although traditional computers The frequency of visual keywords such as "image" and "video" remains relatively unchanged, but "text"/"language" and "audio" appear more frequently.

Even if the word "multimodal" itself does not appear in the title of the paper, it is difficult to deny that computer vision is heading towards a multimodal future.

This is especially evident in visual-verbal tasks, as shown by the sharp rise in Open, Prompt, and Vocabulary.

The most extreme example of this situation is the compound word "open vocabulary", which only appeared 3 times in 2022, but 18 times in 2023.

CVPR 2023论文总结！CV最热领域颁给多模态、扩散模型

##Deeply dig into the keywords in the CVPR 2023 paper titles

PointCloud9

Three-dimensional computer vision applications are moving from inferring 3D information ("depth" and "stereoscopic") from two-dimensional images to directly on 3D point cloud data The computer vision system that does the work.

Creativity in CV Titles

Any comprehensive machine learning-related coverage of 2023 would be incomplete without including ChatGPT in the mix. We decided to keep things interesting and used ChatGPT to find the most creative headlines from CVPR 2023.

For each paper uploaded to Arxiv, we scraped the abstract and asked ChatGPT (GPT-3.5 API) to generate a title for the corresponding CVPR paper.

Then, we combine these titles generated by ChatGPT and the actual paper titles, use OpenAI’s text-embedding-ada-002 model to generate embedding vectors, and calculate the sum of the titles generated by ChatGPT Cosine similarity between author-generated titles.

What can this tell us? The closer ChatGPT is to the actual paper title, the more predictable the title will be. In other words, the more "biased" ChatGPT's predictions are, the more "creative" the author is in naming the paper.

Embedding and cosine similarity provide us with an interesting, although far from perfect, method of quantification.

We sorted the papers according to this metric. Without further ado, here are the most creative headlines:

Actual headline: Tracking Every Thing in the Wild

Predicted headline : Disentangling Classification from Tracking: Introducing TETA for Comprehensive Benchmarking of Multi-Category Multiple Object Tracking

Actual title: Learning to Bootstrap for Combating Label Noise

Predicted title: Learnable Loss Objective for Joint Instance and Label Reweighting in Deep Neural Networks

Actual title: Seeing a Rose in Five Thousand Ways

Predicted title: Learning Object Intrinsics from Single Internet Images for Superior Visual Rendering and Synthesis

Actual title: Why is the winner the best?

Predicted title: Analyzing Winning Strategies in International Benchmarking Competitions for Image Analysis: Insights from a Multi-Center Study of IEEE ISBI and MICCAI 2021

The above is the detailed content of CVPR 2023 paper summary! The hottest field of CV is awarded to multi-modal and diffusion models. For more information, please follow other related articles on the PHP Chinese website!