In 2023, the accelerator button will be pressed for the launch of large models, and Vincentian graphics will be one of the hottest application directions.
Since the birth of Stable Diffusion, large models of Wenshengtu have been emerging at home and abroad, and it felt like "fighting between gods" for a while. Each technology iteration brings rapid improvements in model generation effects and speed.
Just today, Tencent Hunyuan Model also announced the latest progress: Vincentian graph capability is officially launched.
#As soon as we tried it out, we saw Hunyuan Model’s understanding of the broad and profound Chinese food culture. Here I chose the "ant climbing the tree" that makes many large models difficult, but the Hunyuan is easily generated:
The question is, the current Wenshengtu large model is so large, does the Hunyuan large model have any other special advantages?
According to the official introduction, in terms of algorithms and models, the current Vincentian large model still has some challenges, such as insufficient semantic understanding, unreasonable image structure, Problems such as insufficient picture details and low quality.
#Tencent has long begun to explore AI-generated images in advertising scenarios, and the relevant accumulation is quite profound. This Hunyuan large model upgrade’s Wenshengtu capability precisely hopes to solve the three problems of “semantics, content, and texture”.
According to reports, compared with other large models, Tencent Hunyuan’s Wen Sheng Tu has obvious advantages in the realism of portraits and scenes. At the same time, in the Chinese landscape It has good performance in generating scenes such as animation and games.
Hands-on test: Hunyuan Wensheng Tu, what’s the difference?
# To do a good job in "Wen Sheng Tu", a full understanding of "Wen" is crucial.
In terms of
semantic understanding, the Hunyuan Wensheng graph model adopts a Chinese and English bilingual fine-grained model, and at the same time realizes bilingualism based on Chinese and English bilingual modeling Understand, and improve the model's ability to perceive details and generate effects through optimization algorithms.
Prior to this, although popular models like Stable Diffusion supported Chinese to a certain extent, their core data set LAION-5B was still mainly Westernized content, which was I don’t understand enough about Chinese language, food, culture, and customs.
The Hunyuan Wenshengtu model is a native Chinese Wenshengtu model. Regardless of the Chinese poems or idioms input by the user, the user can be directly asked to create paintings.
In terms of
content rationality, Hunyuanwenshengtu enhances the image two-dimensional space position perception ability of the algorithm model and integrates the human skeleton and human hands Prior information such as structure is introduced into the generation process to make the generated image structure more reasonable and improve the problem of unreasonable human structure and hands generated by AI.
In terms of
picture texture, Hunyuanwenshengtu is based on a multi-model fusion method to improve the generated texture. After optimization, the portrait model (hair, wrinkles, etc.) effect of Hunyuan Wenshengtu has been improved by 30%, and the scene model (vegetation, ripples, etc.) effect has been improved by 25%.
#The technical advantages in these three aspects have obviously improved the Hunyuan large model Wenshengtu product experience.
#In order to verify the above capabilities, this website set some questions and conducted a thorough test on the Hunyuan large model at the first time.
Since Hunyuan is a native Chinese model, it naturally understands "ancient Chinese language" better than other similar products. We first let it draw based on ancient poems.
We selected a very artistic ancient poem "When you are drunk, you don't know the sky is in the water, and the boat is full of clear dreams and the stars are overwhelming" to test to see if the Hunyuan large model can generate extreme Picture-like pictures.
In the poem "Boat at Guazhou", the line "The spring breeze turns green again on the south bank of the river, when will the bright moon shine back on me?" writes the homesickness of countless wanderers. . As a result of the generation of Hunyuan, images such as "spring light", "water bank", and "bright moon" are extracted and combined organically, making people feel like they are in a poetic scene after seeing it:
Then comes the interesting "Chinese Food Painting" session. Let's take a classic test on "Shredded Pork with Fish Flavor":
From the Chinese food paintings that make people go crazy, to the current level of eating just by looking at the pictures, we can also feel the continuous evolution of Vincentian painting technology.
Let’s take a look at how Hunyuan does on the industry-recognized problem of “realistic portraits”:
We know that Midjourney became popular in the first place because of the photo of the couple below, which people can’t tell was not generated by AI.
## , let’s examine the ability of the Hunyuan large model to generate “cheating”. The prompt used is:
How do you feel about the realism? In our opinion, the details mentioned in Prompt are sufficient.
This is what Tencent emphasizes: the Hunyuan large model improves the perception of details and the generation effect through optimization algorithms. This ability can only be reflected in many specific scenes.
For example, in an animation scene, a deer is running in the forest, causing fallen leaves to fly up, the moon is very bright and big, and birds are flying in the sky, creating a sense of atmosphere. CG style, side view".
Does it look like the scene in the animation you watched when you were a kid?
In addition, in animation creation, the application potential of Vincentian diagrams is huge.
The prompt we gave to the Hunyuan large model is "Generate 3D, anime style, 1 girl, blond hair, smile, short hair, city background":
What do you think of the generation effect? Can it be used directly as wallpaper?
#What are the self-developed technologies behind Wenshengtu?
If a worker wants to do his job well, he must first sharpen his tools, and the same is true for large models.
We learned that in addition to innovative model algorithms, the Tencent Hunyuan large model can achieve such a Wensheng picture effect that is in line with the Chinese local atmosphere, and it is also inseparable from high-quality pictures. Text matching data, self-developed machine learning framework and powerful computing infrastructure.
Tencent Hunyuan Large Model has formed a full-link self-developed technology path from model algorithm to machine learning framework to AI infrastructure. Multi-level technological accumulation means that the evolution of large models requires one step at a time, starting from practice and improving in practice.
First let’s look at the data engineering that supports model training.
# For any AI, especially large models, data is one of the three indispensable elements. The same is true for the large-model text generation function. Image and text data, especially the matching data between images and texts, has a decisive impact on the generation effect.
However, not all existing data on the Internet can be used immediately. The big problem is that the text description of the picture may not be accurate, which leads to a large number of problems. The quality of most image-text matching data is relatively poor. If used, even if the training time is very long, the model generation effect will still not meet expectations, which will also affect the stability of the generation quality and subsequent iteration efficiency.
# Therefore, improving the quality of image and text data has become the "first hurdle" to ensure the effect of Vincentian images. At this time, it is often necessary to improve data quality through engineering methods, support model training, optimization and upgrade, and build a moat for the algorithm model.
Faced with the problem of image and text matching data, the response strategy of Tencent Hunyuanwenshengtu team is as follows: first, refine the Chinese prompts in a fine-grained manner to improve the correlation between images and texts. Maximize data quality; then adopt a strategy of layering and grading training data to gradually optimize the model and maximize data effects; and finally build a data flywheel, which is the key to rapid iteration of large models. Based on feedback from online users using large models, the team automatically builds training data to speed up model iteration and maximize data efficiency.
#The data quality, effect and efficiency have been improved, which lays the foundation for a good Vincent chart effect. The machine learning framework to be discussed next is equally important.
A powerful machine learning framework or platform will greatly improve the speed and efficiency of developers in building, training and deploying models. Tencent has developed its own Angel machine learning platform for large model training and inference scenarios, which mainly includes AngelPTM for training and AngelHCF for inference.
AngelPTM adopts the ZeRO-Cache optimization strategy and becomes a powerful tool for super-large model training. It expands the capacity of single-machine models through storage management, improves resource utilization through multi-stream asynchronously, and uses video memory to Management improves memory efficiency. In addition, 4D parallelism is used to increase the upper limit of available video memory, reduce communication pressure on kilocards, and release computing potential. The automatic training renewal mechanism supports automatic fault tolerance of kilocard failures and reduces interruption time. The model training situation is also monitored in real time, and the collaborative algorithm optimizes the model training direction.
Currently, AngelPTM realizes high-speed training of hundreds of billions of mixed element base models in parallel based on the industry's first ZeRO-Cache mechanism 4D. The training speed is compared to the mainstream open source framework (DeepSpeed -Chat) increased by 1 times.
AngelHCF mainly customizes diversified service strategies, parallel strategies, framework acceleration (covering common GPU acceleration methods), and model compression (supports commonly used compression in the industry Methods) and efficient model debugging capabilities at five levels to improve the reasoning performance of large models. The inference speed is 1.3 times higher than that of the industry's mainstream framework (FasterTransformer).
Tencent said that its Angel machine learning platform has leading performance and can help provide a better infrastructure system and help large models run at high speed. This allows the Hunyuan large model to generate high-quality images while also greatly improving the generation speed.
With high-quality data and efficient machine learning framework, the continuous operation of large models still faces the test of computing power. After all, in the era of large models, computing power is king.
The function of Tencent Hunyuan Wenshengtu is inseparable from the powerful computing infrastructure provided by Tencent Cloud. In April 2023, Tencent Cloud released a new generation of HCC high-performance computing cluster, using the latest generation of Xinghai self-developed servers, and based on self-developed network and storage architecture, achieving 3.2T ultra-high interconnect bandwidth, TB-level throughput capacity and 10 million level IOPS. The computing power performance of the new generation cluster is improved by 3 times compared with the previous generation and more than 12 times compared with the traditional computing cluster solution.
# While strengthening the underlying hardware, the upper-layer software capabilities must also go hand in hand. The new generation HCC cluster integrates Tencent Cloud's self-developed TACO training acceleration engine and has made a lot of system-level optimizations from the network protocol, communication strategy, AI framework, and model compilation levels. This comprehensive set of ecological training acceleration solutions can not only help customers lower the AI optimization threshold and improve AI training performance, but also greatly reduce training tuning and computing power costs.
It seems that the three major factors that restrict large models, algorithm, data and computing power, are no longer a problem in Tencent Hunyuan large model. Naturally, the quality and effect of Vincentian drawings are also guaranteed.
The effect is "false and real",
The ability of Wenshengtu has been embedded in Tencent advertising scenes
The Hunyuan large model Wenshengtu ability we saw today was not achieved overnight, but a real process of evolution.
At the 2023 Tencent Global Digital Ecosystem Conference held last month, Tencent’s Hunyuan large model was officially unveiled. Jiang Jie, vice president of Tencent Group, said at the time that Hunyuan is always on the road. Tencent will continue to evolve Hunyuan’s capabilities and hopes to bring surprises to everyone every month.
Currently, Tencent has 180 internal businesses connected to the Hunyuan large model, including Tencent Conference, Tencent Documents, Enterprise WeChat, Tencent Advertising and WeChat Search. . At the same time, customers from multiple industries such as retail, education, finance, medical care, media, transportation, government affairs, etc. also call Tencent Hunyuan API through Tencent Cloud. The application areas include intelligent question and answer, content creation, data analysis, code assistant and other scenarios.
The newly opened Vincentian graph capability is the biggest surprise that Tencent’s Hunyuan model brings to us, demonstrating its leading capabilities in the field of automatic image generation. Of course, Tencent Hunyuan Wenshengtu is also gradually evolving, and more Wenshengtu related and Wenshengtu functions will be developed in the future. We can look forward to a wave of it.
Currently, Hunyuanwen’s image-generating capabilities have been embedded in Tencent’s advertising scenarios, such as generating product advertisements or advertising images. In multiple rounds of evaluations under the advertising business, the case excellence rate and advertiser adoption rate of Tencent Hunyuan Wenshengtu reached 86% and 26% respectively, which are both higher than similar models.
# Let’s first look at the following example, which requires the Hunyuan large model to generate a hotel room. Judging from the effects, the Hunyuan Wensheng picture effect is obviously better after the upgrade, the design and quality are greatly improved, and the details are richer. Even comparing it to Midjourney, the results are comparable.
The character class generation scene has a similar effect. After the upgrade, the portraits generated by Hunyuan are more realistic, such as facial skin color, wrinkles and other details.
In addition to advertising scenes, Tencent is also constantly exploring other demand scenarios for Wenshengtu, such as generating game elements and game characters in game scenes, and generating novel accessories in content scenes. Pictures, illustrations, cloud business scenarios open hybrid capabilities to customers in different industries.
No matter how powerful the model is, it must be used by more people and continue to receive feedback, so that it can make further progress.
It can be foreseen that Tencent products will usher in an explosion of Hunyuan Wenshengtu capabilities in the future, and users will also experience more of the charm brought by AIGC.
The above is the detailed content of Tencent's Hunyuan large model has been upgraded again, with shocking release of Vincentian graph capabilities and comprehensive actual measurement and analysis. For more information, please follow other related articles on the PHP Chinese website!