What is the strength of Google Gemini? Carnegie Mellon University conducted a professional and objective third-party comparison
To ensure fairness,all models use the same prompts and generation parameters, and provide reproducible code and complete Transparent results.
will not use CoT@32 to compare 5-shot like Google’s official conference .
Result in one sentence: The Gemini Pro version is close to but slightly inferior to GPT-3.5 Turbo, GPT-4 is still far ahead.
In the in-depth analysis, we also found some strange characteristics of Gemini, such as I like to choose D for multiple-choice questions...
Many researchers said that Gemini underwent very detailed testing just a few days after its release, which is a very remarkable achievement
This test specifically compares 6 different tasks, and selects the corresponding data set for each task
According to the results, it can be seen that using thought chain prompts in this type of task does not necessarily improve the effect
In the MMLU data set, all questions are multiple-choice questions. After further analyzing the results, a strange phenomenon was discovered: Gemini prefers option D. The distribution of the GPT series among the four options is much more balanced. The team suggested that this may be the reason why Gemini
caused by not fine-tuning a lot of instructions for multiple-choice questions.
In addition, Gemini’s security filtering is very strict. When it comes to ethical questions, it only answers 85% of the questions. And when it came to questions related to human sexuality, it only answered 28% of the questions Gemini Pro outperformed GPT in security studies and high school microeconomics - 3.5, but the gap is not big, and the team said it could not find anything special Reasoning: Not good at long questionsEspecially on long problems, GPT-4 Turbo has almost no performance. The performance drops, which shows that it has a strong ability to understand complex problems. This type of problem involves people exchanging items, and ultimately requires AI to determine which items each person owns
Tasks Gemini excels at include understanding the world's sports knowledge, manipulating symbol stacks, sorting words alphabetically, and parsing tables
##Mathematics: Surpassing in complex tasks The question itself is too long, causing the performance of Gemini Pro and GPT-3.5 to decline at the same time. Only GPT-4 can maintain a consistent level When the length of the thought chain reaches its longest, Gemini exceeds GPT-3.5 Code: Good at matplotlib For code questions, Gemini does not perform well on questions with longer reference answers The GPT series is more powerful in most types, but performs poorly on matplotlib Not at all good #Translation: as long as it is answered, the quality is highIn the translation task, Gemini refused to answer 12 types of questions, but As long as the translation quality is excellent, the overall performance exceeds GPT-4 The languages Gemini refuses to translate mainly involve Latin and Arabic Network Navigation: Good at cross-site surfingWebArena simulates an Internet environment for AI, including e-commerce, social forums, GitLab collaborative development, content management systems, and online maps. AI needs to find information in this environment or complete tasks across sitesGemini performs worse overall than GPT-3.5 Turbo, but performs slightly better on tasks across multiple sites. Netizen: But it’s freeIn the end, CMU associate professor Graham Newbig acknowledged some limitations of the studyThe founder of Mistral AI has provided the team with access to the official version, which he believes will bring better results
Although Gemini Pro is not as good as GPT-3.5, Its advantage is that it can be used for free if it does not exceed 60 calls per minute.
Therefore, many individual developers have changed camps
Currently Gemini has the highest The Ultra version has not yet been released, and the CMU team plans to continue this research by then. Do you think Gemini Ultra can reach the level of GPT-4?
This article introduces the paper in detail: https://arxiv.org/abs/2312.11444Reference link:
[1]https://twitter.com/gneubig/status/1737108977954251216.
The above is the detailed content of CMU conducted a detailed comparative study and found that GPT-3.5 is superior to Gemini Pro, ensuring fair, transparent and reproducible performance. For more information, please follow other related articles on the PHP Chinese website!