Enterprise-level SOTA large model, what signals does Anthropic's Claude3 release?
Author | Wan Chen
Editor | Jingyu
is an entrepreneurial project as the head of OpenAI GPT3 R&D, Anthropic is seen as the startup that can best compete with OpenAI.
Anthropic released a set of large Claude 3 series models on Monday local time, claiming that its most powerful model outperformed OpenAI’s GPT-4 and Google’s Gemini 1.0 Ultra in various benchmark tests. .
However, the ability to handle more complex reasoning tasks, be more intelligent, and respond faster, these comprehensive capabilities that rank among the top three large models are only the basic skills of Claude3.
Anthropic is committed to becoming the best partner for corporate customers.
This is first reflected in Claude3, which is a set of models: Haiku, Sonnet and Opus, allowing enterprise customers to choose versions with different performance and different costs according to their own scenarios.
Secondly, Anthropic emphasizes that its own model is the safest. Anthropic President Daniela Amodei introduced that a technology called "Constitutional Artificial Intelligence" was introduced in Claude3's training to enhance its safety, trustworthiness, and reliability. Fu Yao, a doctoral student in large models and reasoning at the University of Edinburgh, pointed out after reading Claude3’s technical report that Claude3 performed well in benchmark tests of complex reasoning, especially in the financial and medical fields. As a ToB company, Anthropic chooses to focus on optimizing the areas with the most profit potential.
Now, Anthropic is open to use two models of the Claude3 series (Haiku and Sonnet) in 159 countries, and the most powerful version, Opus, is also about to be launched. At the same time, Anthropic also provides services through the cloud platforms of Amazon and Google, the latter of which invested US$4 billion and US$2 billion respectively in Anthropic.
Among them, Claude 3 Opus is the most intelligent model in this group of models, especially in processing highly complex tasks. Opus outperforms its peers in most common benchmarks, including Undergraduate Level Expert Knowledge (MMLU), Graduate Level Expert Reasoning (GPQA), Basic Mathematics (GSM8K), and more. It shows near-human-level understanding and fluency on complex tasks. It is currently Anthropic's most cutting-edge exploration of general intelligence, "demonstrating the outer limits of generative artificial intelligence."
02,
Iteration targeting enterprise customersIn terms of accuracy, Anthropic uses a large number of complex factual questions to target known weaknesses in current models, classifying answers into correct answers, incorrect answers (or hallucinations) and acknowledging uncertainty. Accordingly, the Claude3 model indicates that it does not know the answer, rather than providing incorrect information. The strongest version of them all, Claude 3 Opus, doubled the accuracy (or correct answers) on challenging open-ended questions than Claude 2.1, while also reducing the level of incorrect answers.
At the same time, due to the improvement in context understanding capabilities, the Claude3 family will make fewer rejections in response to user tasks compared to previous versions.
In addition to a more accurate reply, Anthropic said it will bring to Claude 3 "citation" feature that can point to precise sentences in reference materials to verify their answers.
Currently, Claude 3 series models will provide a context window of 200K tokens. Subsequently, all three models will be able to accept inputs of more than 1 million tokens, and this capability will be provided to select customers who require enhanced processing capabilities. Anthropic briefly elaborated on Claude3’s upper text window capabilities in its technical report, including its ability to effectively handle longer contextual cue words and its recall capabilities.
It is worth noting that, As a multi-modal model, Claude3 can input images but cannot output image content. Co-founder Daniela Amodei said this is because "we found that businesses have much less need for images."
The release of Claude3 was released after the controversy caused by the images generated by Google Gemini. Claude, which is aimed at enterprise customers, is also bound to control and balance issues such as value bias caused by AI.
In this regard, Dario Amodei emphasized the difficulty of controlling artificial intelligence models, calling it "inexact science." He said the company has a dedicated team dedicated to assessing and mitigating the various risks posed by the model.
Another co-founder, Daniela Amodei, also admitted that completely unbiased AI may not be possible with current methods. "Creating a completely neutral generative AI tool is nearly impossible, not only technically, but also because not everyone agrees on what neutrality is," she said. .
This article comes from the WeChat public account: Geek Park (ID: geekpark), author: Wan Chen
The above is the detailed content of The just exposed Claude3 directly attacks the biggest weakness of OpenAI. For more information, please follow other related articles on the PHP Chinese website!