Tsinghua University won the Best Paper + Time Test Award, Shandong University received an honorable mention, and the SIGIR 2024 awards were announced-AI-php.cn

Tsinghua University has outstanding results.

The 47th Association for Computing Machinery Conference on Information Retrieval (ACM SIGIR) will be held in Washington, DC, USA from July 14th to 18th, 2024. This conference is the top academic conference in the field of information retrieval.

Just now, the conference announced the Best Paper Award, Best Paper Runner-up, Best Paper Honorable Mention Award, and Time Test Award.

Among them, Tsinghua University, Hillhouse School of Artificial Intelligence at Renmin University of China, and the Xiaohongshu team won the best paper; researchers from the University of Glasgow and the University of Pisa won the runner-up; the honorable mention award for the best paper was awarded to researchers from Shandong University (Qingdao), Leiden University, and the University of Amsterdam; the Time Test Award was awarded to researchers from Tsinghua University and the University of California, Santa Cruz.

Next, let’s look at the specific content of the winning paper.

Best paper

Paper: Scaling Laws For Dense Retrieval
Paper authors: Fang Yan, Jingtao Zhan, Ai Qingyao, Mao Jiaxin, Weihang Su, Jia Chen, Liu Yiqun
Institutions: Tsinghua University, Hillhouse School of Artificial Intelligence, Renmin University of China, Xiaohongshu
Paper link: https://dl.acm.org/doi/abs/10.1145/3626772.3657743

About the paper: Researchers have observed scaling laws across a wide range of tasks, especially language generation. Research shows that the performance of large language models follows predictable patterns across model and dataset sizes, which helps in designing training strategies effectively and efficiently, especially as large-scale training becomes increasingly resource-intensive. However, in dense retrieval, the expansion law has not been fully explored.

This study explores how scaling affects the performance of dense retrieval models. Specifically, the research team implemented dense retrieval models with different numbers of parameters and trained them using different amounts of annotated data. This study uses contrastive entropy as an evaluation metric. Compared with discrete ranking metrics, contrastive entropy is continuous and therefore can accurately reflect the performance of the model.

Experimental results show that the performance of dense retrieval models follows an exact power-law scaling related to the model size and the number of annotations.

In addition, the study also shows that the expansion law helps optimize the training process, such as solving resource allocation problems under budget constraints.

This study greatly contributes to understanding the scaling effects of dense retrieval models and provides meaningful guidance for future research.

The runner-up for the best paper

The runner-up for the best paper in this year's ACM SIGIR was awarded to the paper "A Reproducibility Study of PLAID". The authors of the paper include Sean MacAvaney from the University of Glasgow and Nicola Tonellotto from the University of Pisa.

清华包揽最佳论文+时间检验奖，山大获荣誉提名，SIGIR 2024奖项出炉

Paper address: https://arxiv.org/pdf/2404.14989

Paper abstract: ColBERTv2’s PLAID algorithm uses clustered term representations to retrieve and progressively prune documents to obtain a final document score. This article reproduces and fills in the missing gaps in the original text. By studying the parameters introduced by PLAID, researchers found that its Pareto frontier is formed by the balance between three parameters. Deviation from recommended settings can significantly increase latency without necessarily improving its effectiveness.

Based on this finding, this paper compares PLAID to an important baseline missing from the paper: reordering the lexical system. It is found that applying ColBERTv2 as a reorderer on top of the initial BM25 result pool provides a better efficiency-effectiveness trade-off in low-latency settings. This work highlights the importance of careful selection of relevant baselines when evaluating retrieval engine efficiency.

Honorable Mention Award for Best Paper

The Honorable Mention Award for Best Paper at this conference was won by researchers from Shandong University (Qingdao), Leiden University, and University of Amsterdam. The winning paper is "Generative Retrieval as Multi-Vector Dense Retrieval".

Authors of the paper: Wu Shiguang, Wei Wenda, Zhang Mengqi, Chen Zhumin, Ma Jun, Ren Zhaochun, Maarten de Rijke, Ren Pengjie
Paper address: https://arxiv.org/pdf/2404.00684

Abstract: This paper measures the relevance of document queries by demonstrating that generative retrieval and multi-vector dense retrieval share the same framework. Specifically, they studied the attention layer and prediction head of generative retrieval, revealing that generative retrieval can be understood as a special case of multi-vector dense retrieval. Both methods calculate the correlation by calculating the sum of the products of the query vector and the document vector with the alignment matrix.

Then, the researchers explored how to apply this framework to generative retrieval, and they adopted different strategies to calculate document token vectors and alignment matrices. Experiments are conducted to verify the conclusions, showing that both paradigms exhibit commonalities in term matching in their alignment matrices.

Time-tested award

This year’s ACM SIGIR Time-tested Award was awarded to the research on explainable recommendation published at SIGIR 2014 10 years ago, the paper is "Explicit Factor Models for Explainable Recommendation" based on Phrase-level Sentiment Analysis”.

Authors of the paper: Zhang Yongfeng, Lai Guoquan, Zhang Min, Yi Zhang, Liu Yiqun, Ma Shaoping
Institution: Tsinghua University, University of California, Santa Cruz
Paper link: https:// www.cs.cmu.edu/~glai1/papers/yongfeng-guokun-sigir14.pdf

This study defines the problem of "interpretable recommendation" for the first time and proposes corresponding sentiment analysis methods for To solve this technical challenge, it has been playing a leading role in related fields.

Paper abstract: Collaborative filtering (CF)-based recommendation algorithms, such as latent factor models (LFM), perform well in terms of prediction accuracy. However, the underlying characteristics make it difficult to explain the recommendation results to users.

Fortunately, as online user reviews continue to grow, the information available for training recommender systems is no longer limited to numerical star ratings or user/item characteristics. By extracting users' explicit opinions on various aspects of a product from reviews, it is possible to gain a more detailed understanding of what users care about, which further reveals the possibility of making explainable recommendations.

This article proposes EFM (Explicit Factor Model) to generate interpretable recommendations while maintaining high prediction accuracy.

Researchers first extract explicit product features and user opinions by conducting phrase-level sentiment analysis on user reviews, and then generate recommendations and non-recommendations based on specific product features of user interest and learned latent features. Additionally, intuitive feature-level explanations of why an item is recommended or not recommended are generated from the model.

Offline experimental results on multiple real-world datasets show that the framework proposed in this study outperforms competing baseline algorithms on both rating prediction and top-K recommendation tasks. Online experiments show that detailed explanations make recommendations and non-recommendations more influential on users’ purchasing behavior.

Young Scholar Award

The ACM SIGIR Young Scholar Award aims to recognize researchers who have played an important role in information retrieval research, scholar community building, and promotion of academic equity. It is required to be awarded a doctorate 7 Young researchers within 20 years. Ai Qingyao, an assistant professor from the Department of Computer Science of Tsinghua University, and Wang Xiang, a professor and doctoral supervisor from the School of Cyberspace Security and Big Data School of the University of Science and Technology of China, won the SIGIR 2024 Young Scholar Award.

Ai Qingyao

Ai Qingyao is an assistant professor in the Department of Computer Science at Tsinghua University. His main research areas focus on information retrieval, machine learning and natural language processing research. The key research direction is the research and design of intelligent information retrieval systems, including information representation learning, ranking optimization theory, and the application of large language models in Internet search and recommendation and smart justice.

Wang Xiang

Wang Xiang is a professor and doctoral supervisor at the School of Cyberspace Security and Big Data School, University of Science and Technology of China. Professor Wang Xiang’s research interests include information retrieval, data mining, and trustworthy and explainable artificial intelligence, especially recommendation systems, graph learning, and social media analysis.

The above is the detailed content of Tsinghua University won the Best Paper + Time Test Award, Shandong University received an honorable mention, and the SIGIR 2024 awards were announced. For more information, please follow other related articles on the PHP Chinese website!