Zhiyuan updated the ranking of large models: Doubao large model ranked first in China in 'objective evaluation'-AI-php.cn

FlagEval evaluation platform released the latest list. In the "objective evaluation" released in mid-June, GPT-4 ranked first among closed-source large models, and Doubao-Pro (bean bag large model) ranked second. At the same time It is also the domestic large model with the highest score; followed by ERNIE 4.0, Baichuan3, Moonshot-v1, etc. In the open answer evaluation, Doubao-Pro also ranked second, scoring more than GPT-4o and GPT-4. New technologies continue to emerge, and only platforms that can continue to pay attention to and adapt to new technologies can remain invincible in this highly competitive market. Word count: 114

Zhiyuan updated the ranking of large models: Doubao large model ranked first in China in objective evaluation

Picture: The large bean bag model won the second overall rating in theFlagEvalobjective evaluation (2024year6month)

FlagEval large model evaluation platform is developed by Zhiyuan Research Institute and multiple The university team built it together, taking the development ladder of human cognitive ability as the benchmark and aligning the cognitive level that the large model can achieve. FlagEval has constructed a large number of original non-public review sets to ensure review quality and fairness. Since its launch in June 2023, FlagEval has completed more than 1,000 evaluations covering large models around the world.

Doubao-Pro is a large language model independently developed by ByteDance and was officially released on May 15. In this issue of FlagEval's large model rankings, Doubao's large model made its debut in the public evaluation and won the runner-up. This model has powerful sequence generation and natural language understanding capabilities, and can be widely used in dialogue generation, text summarization, machine translation and other fields.

In the objective evaluation and subjective evaluation, it is shown that the mathematical ability, knowledge application, task solving and other abilities of the bean bag model have excellent performance in both objective evaluation and subjective evaluation. Among them, the knowledge application and mathematical ability scores ranked first in the objective evaluation and the top three in the subjective evaluation, and the task solving scores ranked in the top three in the objective evaluation.

Mathematical ability is an important dimension in evaluating whether a large model is "smart". Previously, the Natural Language Processing Laboratory of Fudan University conducted an evaluation of 13 mainstream large-model products for the 2024 College Entrance Examination mathematics questions. Doubao's answers to the new mathematics college entrance examination standard II paper obtained the highest score, with an accuracy rate of 74.66% for objective questions. , the results are better than GPT-4o and many domestic large model products.

Zhiyuan updated the ranking of large models: Doubao large model ranked first in China in objective evaluation

Picture source: FudanNLPLab Public Account

豆包大The model is one of the large models with the largest usage volume and the richest application scenarios in China, and the average daily number of tokens processed reaches hundreds of billions. Its eponymous AI conversation assistant "Doubao" ranks first in downloads among AIGC applications in the Apple APP Store and major Android application markets. Currently, Doubao Big Model is opening its services to the enterprise market through ByteDance, and has established cooperation with smart terminal manufacturers such as OPPO, Honor, Xiaomi, Samsung, and Asus.

The above is the detailed content of Zhiyuan updated the ranking of large models: Doubao large model ranked first in China in 'objective evaluation'. For more information, please follow other related articles on the PHP Chinese website!