Home  >  Article  >  Technology peripherals  >  Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

王林
王林forward
2023-04-11 15:13:031809browse

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied


The new generation of information technology led by AI is driving a new wave of science and technology. As one of the most rapidly developing mobile Internet platforms in China in recent years, Xiaohongshu has taken advantage of the momentum and has now formed a very large UGC community focusing on graphic, text and short video content. In this unique and active community, massive multi-modal data and user behavior feedback are generated every day, giving rise to new problems that are both valuable and challenging.

Many exciting developments are currently taking place in large-scale deep learning systems. At the "Xiaohongshu REDtech Youth Technology Salon" event on October 15, Xiaohongshu Vice President of Technology Cage shared "Large-Scale Deep Learning System Technology and Its Application in Xiaohongshu" and unveiled LarC for us. "Mystery".

Cage: Vice President of Technology of Xiaohongshu. He graduated from Shanghai Jiao Tong University. He once served as Vice President of Technology of Huanju Times and Chief Architect of Baidu Fengchao, responsible for Baidu search advertising CTR machine learning algorithm work. He once served as the China technical leader of the IBM Deep Question Answering (DeepQA) project.

The following content is compiled based on Cage’s on-site report

1. Xiaohongshu business overview

The real life of ordinary people Experience Sharing

Xiaohongshu is a booming content community where a large number of people who understand life and love sharing exchange their life experiences and attitudes with each other, and continue to attract more and more people. More and more users are joining. Now, Xiaohongshu has 200 million monthly active users, of which more than 70% are born in the 1990s. 50% of users come from first- and second-tier cities, and half come from third- and fourth-tier cities. The composition of users is very rich and young. .

"Ordinary people" are sharing their "real" "life experiences", which is a very big difference between Xiaohongshu and other content platforms and communities. First of all, the sharers are "ordinary people". Secondly, "sincere sharing and friendly interaction" are the Xiaohongshu community conventions, and "sincerity" is a very important point. Sharing in these communities is closely related to our offline life consumption, such as Treasure Bookstore, or how to dress, decorate, cook, etc., which are everyone's daily "life experience".

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

We can also use some numbers to measure the development of the Xiaohongshu community over the years. We see that the number of note releases has increased at a very high rate every year from 2018 to 2021. The speed is growing rapidly. From 2020 to 2021, the number of notes published by Xiaohongshu users increased by more than 150% year-on-year.

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied


##Three main businesses: community, commercialization, e-commerce

In such a rapidly developing content community, the three most important businesses are community, commercialization and e-commerce.

First of all, our content community and content platform is a lifestyle content community

covering all life categories, mainly UGC. Also because of this kind of "sincere sharing" that fits life and daily consumption, users have a high degree of trust in our community content. Everyone will be "seeded" when they see good lifestyles, consumer content, services and products, etc. "Grass", We use our unique "grass planting" business model to bring about the transformation of brands and effects.

"After planting grass, can you pull it out?" While consuming content, everyone also hopes to be able to buy their favorite items naturally and conveniently. This is our

efficient closed-loop consumption Field , that is, the e-commerce part.

2. Xiaohongshu Technical Challenge

Multimodal technology is one of the technology directions that has attracted widespread attention and is developing rapidly in the entire AI field. The UGC community and content ecology contain a large number of images. Texts, videos, text and user behavior information generate a massive amount of high-quality multi-modal data, making it an excellent practical scenario. Users like good content when they see it, perform various search behaviors, watch a certain video, etc., which constitute a large amount of actual user feedback.

Now the number of feedback samples actually generated through user behavior every day is tens of billions.

How to mine user-interested content and good commercial content in massive multi-modal data Starting from this goal, many valuable and challenging problems are derived.

How do we solve these technologies:

Real-time recommendation system for thousands of people

When you open Xiaohongshu, the first thing you see is the waterfall flow or content flow. These are all It is the content recommended by the recommendation system to everyone. According to statistics, Xiaohongshu generates tens of billions of user actions every day. For this data, Xiaohongshu’s technical team uses a machine learning framework based on LarC to train the model, and based on the rules in user behavior, it finds content that users are interested in and recommends it to users.

The picture below shows the general structure of the Xiaohongshu recommendation model. This is a multi-task machine learning model that can predict the user's clicks, dwell time, whether to like and collect, etc. In view of the massive coefficient parameters generated by the Xiaohongshu platform, Xiaohongshu updates and captures these parameters through a very large-scale conflict-free parameter server.

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

The Online Training of the recommended system is as follows. When users browse the information flow, the recommendation system will capture the user's browsing, clicks, likes and other behaviors in real time. These behaviors will be spliced ​​based on Flink's real-time processing computing engine to generate high-performance samples. Samples will be sent to the model in real time for prediction. At the same time, these short-lived accumulated samples will also be used for a very short online training to update model parameters. These updated model parameters will be published online immediately to serve the next request. The entire process is kept within minutes.

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

#There is also a classic question in the industry. For example, when people browse recommended content, they often find: Why are things that I have seen before intensively pushed? What should I do if the things I watch are not fresh enough?

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

In the recommendation scenario, focusing on a shorter time period will cause serious problems of chasing and information cocooning. Xiaohongshu’s technical team is concerned about the diversified long-term and short-term behaviors of users. Different sequence modeling methods were designed, which brought significant improvements in multiple dimensions. In addition, regarding the diversity issue of content recommendation, Xiaohongshu’s technical team improved the traditional diversity approach from DPP to SSD algorithm, and efficiently calculated the sliding window in the information flow recommendation scenario, thus transforming the value ranking of single article models. Model the entire browsing cycle. What this relies on is the twin neural network learning the similarity of long-tail content.

We have published related work results at the KDD 2021 conference. It has transformed from an estimate of the value of a single article to an estimate of the value of a sequence, and from the diversity of a single article to the diversity of multiple articles. Behind the scenes It is also based on the SSD algorithm and the assessment of content similarity based on this twin neural network.

Multi-modal generalized life search engine

Because the Xiaohongshu community contains a large amount of very useful information in real life, many users will refer to Xiaohongshu Use the book as a search engine. This includes some challenges, such as searching in multiple data forms, serious long-tail phenomena, and intent understanding issues.

Existing image and text search engines can search for pictures through text, but the method is relatively simple. Usually, the pictures are tagged with text, and then the text is matched. The next-generation multi-modal pan-life search engine built by the Xiaohongshu team is based on an in-depth understanding of multi-modal content. It can truly search for visual content through images, text and text, and can also make more personalized searches based on the characteristics of users. search.

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

What is a pan-life knowledge search engine? For example, we see a good-looking piece of clothing or shoes on Xiaohongshu and want to search for its combinations and how it looks in different situations. This is a search for life knowledge, and it is also a multi-modal search.

This shows the multi-modality planned by the Xiaohongshu technical team, especially for technical architecture such as image search. One of the most critical dependencies is the feature multi-module, which requires reliance on large-scale neural networks. To do representation learning, you can have a good representation of the content contained in the picture, whether it is clothes, shoes or other commodities. It is very good to retrieve the same products or similar products from a large amount of multi-modal content. This is an application of our large-scale neural network in search.

AI generates more original commercial content

Compared with other platforms, Xiaohongshu’s commercial content has a big difference-original Biochemical. The so-called nativeization means that from the perspective of likes, comments and other behaviors, users appreciate the content very much and may not feel that it is commercial content at all. But for merchants on the platform, the threshold for producing such commercial content is very high. How to strike a good balance between the business intentions of merchants and the user value of content produced is a critical issue.

To this end, the Xiaohongshu technical team uses generative technology based on large-scale neural networks to help merchants generate better titles and content based on the content. For example, merchants can choose to express multiple selling points, or they can choose to highlight target customer groups, or their favorite Xiaohongshu style. The machine will automatically give suggested titles. After quoting the titles created by the machine, regardless of business effects, clicks or The length of stay has been greatly improved, and users also like this kind of content very much, so it achieves a good balance between business and user value.

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

This is actually based on large-scale pre-training models, including industry-leading T5, BERT, GPT and other model architectures. These model architectures are all available in Xiaohongshu trained on multi-modal data. Part of the pre-trained model is used to understand the content of notes, and part of the pre-trained model is used to guide the generative model to generate titles. These are how related technologies are applied in the business field.

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

Large-scale machine learning platform

All the above machine learning content is actually based on small The LarC machine learning platform is self-developed by the Red Book technical team. It was launched in 2019, and by 2020 and 2021, related machine learning frameworks and platforms were promoted to all fields such as search, recommendation, and advertising. In 2022, LarC will become a platform.

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

Currently, the capabilities of the LarC machine learning platform are quite complete, covering multiple levels from underlying infrastructure to computing framework, resource scheduling, offline applications and online deployment (including the bid The yellow part represents that it has been realized).

Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied

With the help of LarC machine learning platform, Xiaohongshu technical team hopes to help all algorithm students quickly and efficiently process massive data and train large-scale machine learning and deep learning models.

3. Summary

Xiaohongshu is a rapidly developing content community. "Ordinary people", "real sharing" and "life experience" are its keywords.

In such a scenario with massive multi-modal data and user feedback data, many cutting-edge technology explorations have been spawned. The above is a selection of some points from a large amount of technical work to share with you. In fact, there is a lot more content. I hope everyone can understand Xiaohongshu's technology and large-scale deep learning from it.

The above is the detailed content of Xiaohongshu’s “grass planting” mechanism is decrypted for the first time: how large-scale deep learning system technology is applied. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete