Home > Technology peripherals > AI > body text

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo

PHPz
Release: 2024-05-07 16:13:20
forward
896 people have browsed it

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here.

全球最强开源 MoE 模型来了,中文能力比肩 GPT-4,价格仅为 GPT-4-Turbo 的近百分之一

DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each tag. Compared with DeepSeek 67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times.

DeepSeek is a company exploring the nature of artificial general intelligence (AGI) and is committed to integrating research, engineering and business.

The comprehensive capabilities of DeepSeek-V2

In the current mainstream list of large models, DeepSeek-V2 performs well:

  • The Chinese comprehensive ability (AlignBench) is the strongest among the open source models: it is in the same echelon with closed source models such as GPT-4-Turbo and Wenxin 4.0 in the evaluation
  • The English comprehensive ability (MT-Bench) is in the third place First echelon: English comprehensive ability (MT-Bench) is in the same echelon as the strongest open source model LLaMA3-70B, surpassing the strongest MoE open source model Mixtral 8x22B
  • Knowledge, mathematics, reasoning, programming and other ranking results At the forefront
  • Supports 128K context windows

全球最强开源 MoE 模型来了,中文能力比肩 GPT-4,价格仅为 GPT-4-Turbo 的近百分之一

New model structure

The potential of AI Being constantly excavated, we can’t help but ask: What is the key to promoting intelligent progress? DeepSeek-V2 gives the answer - the perfect combination of innovative architecture and cost-effectiveness.

"DeepSeek-V2 is an improved version. With a total parameter of 236B and activation of 21B, it finally reaches the capability of 70B~110B Dense model. At the same time, the memory consumption is only 1/5 of the same level model~ 1/100. On the 8-card H800 machine, it can process the input of more than 100,000 tokens per second and the output of more than 50,000 tokens per second. This is not only a leap in technology, but also a revolution in cost control. "

Today, with the rapid development of AI technology, the emergence of DeepSeek-V2 not only represents a technological breakthrough, but also heralds the popularization of intelligent applications. It lowers the threshold for AI and allows more companies and individuals to enjoy the benefits of efficient intelligent services. At the same time, it also heralds the popularization of intelligent applications. It lowers the threshold for AI and allows more companies and individuals to enjoy the benefits of efficient intelligent services.

Chinese capability VS price

In terms of Chinese capability, DeepSeek-V2 leads the world in the AlignBench ranking while providing a very competitive API price.

全球最强开源 MoE 模型来了,中文能力比肩 GPT-4,价格仅为 GPT-4-Turbo 的近百分之一全球最强开源 MoE 模型来了,中文能力比肩 GPT-4,价格仅为 GPT-4-Turbo 的近百分之一

Both open source models and papers

DeepSeek-V2 is not just a model, it is a gateway to more The key to the smart world. It opens a new chapter in AI applications with lower cost and higher performance. The open source of DeepSeek-V2 is the best proof of this belief. It will inspire more people's innovative spirit and jointly promote the future of human intelligence.

  • Model weights: https://huggingface.co/deepseek-ai
  • Open source address: https://github.com/deepseek-ai/DeepSeek-V2

As AI continues to evolve, how do you think DeepSeek-V2 will change our world? Let’s wait and see. If you are interested, you can visit chat.deepseek.com to personally experience the technological changes brought about by DeepSeek-V2.

References

[1]

DeepSeek-V2: https: //m.sbmmt.com/link/b2651c9921723afdfd04ed61ec302a6b

The above is the detailed content of The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!