The follow-up to the incident of the Stanford team plagiarizing a large model from Tsinghua University——
The Llama3-V team admitted plagiarism. Two of the undergraduates from Stanford also talked to another The author cut.
The latest apology tweets were sent by Siddharth Sharma(Siddhartha) and Aksh Garg(Akshay).
is not among them, Mustafa Aljadery (referred to as Lao Mu) from the University of Southern California is accused of being the main fault party, and he has been missing since yesterday:
We It is hoped that Lao Mu will make the first statement, but he has been unable to be contacted since yesterday.
Siddharth, I (Akshi) and Lao Mu released Llama3-V, and Lao Mu wrote the code for the project.
Siddharth’s and my role was to help him promote the model on Medium and Twitter. I looked up recent papers to verify the innovativeness of the work, but we were not told and discovered the previous work on wall intelligence.
For Lao Mu himself, who is accused of running away, the X homepage is currently protected and locked. You can only follow it by applying:
On the whole, the content of this apology tweet is very different from the tweet yesterday that was hastily deleted after being sent out. The main emphasis is on the apology and further blaming .
After all, evenChristopher Manning, director of the Stanford Artificial Intelligence Laboratory, ended up complaining:
This is a typical failure to admit one’s mistakes !
He believed that the team evaded the important points after the incident, using the excuses of "similar architecture" and "MiniCPM was implemented faster than us", and refused to admit that it was plagiarism.
#But the new apology statement did not stop netizens’ doubts. And the latest revelations also point out that these brothers are simply repeat offenders of plagiarism, and the textbooks they wrote before were also plagiarized.
As for the original author team, Face Wall Intelligence, except CEO Li Dahai responded yesterday, "It is also a kind of reception. In addition to the "method recognized by the international team", the chief scientist Liu Zhiyuan has also come forward to "personally answer" on Zhihu:
is already relatively convinced that Llama3-V is a perfect match for our MiniCPM-Llama3-V 2.5 set shell.
The rapid development of artificial intelligence is inseparable from the open source sharing of global algorithms, data and models, so that people can always stand on the shoulders of SOTA and continue to move forward. Our open source MiniCPM-Llama3-V 2.5 this time uses the latest Llama3 as the language model base. The cornerstone of open source sharing is compliance with open source protocols, trust in other contributors, and respect and tribute to the achievements of predecessors. The Llama3-V team has undoubtedly severely damaged this. They have deleted the database from Huggingface after being questioned. Two of the three members of the team are only undergraduates at Stanford University. There is still a long way to go in the future. If the mistakes can be corrected, it will be a great improvement.
Let’s briefly review this big melon first.
In one sentence, some netizens discovered that the multi-modal large model Llama3-V of the Stanford team, which has recently become popular in the open source community, has an architecture and code that is almost the same as that of the domestic MiniCPM-Llama3-V 2.5. It's exactly the same, and many evidences were cited pointing to Llama3-V plagiarism.
As the incident gradually unfolded, the Stanford AI team deleted the database and ran away, and the Wall-Facing Intelligence team also launched an investigation into the matter.
Liu Zhiyuan, Chief Scientist of Wall-Facing Intelligence and Permanent Associate Professor of Tsinghua University, gave a major reason for judging that Llama3-V is MiniCPM-Llama3-V 2.5 shell. It is precisely for the identification ability of Tsinghua Jane.
This is the "Easter egg" capability of MiniCPM-Llama3-V 2.5. They were trained using a data set scanned and annotated from Tsinghua University Jane. has not been made public. The performance of Llama3-V is exactly the same as that of MiniCPM-Llama3-V 2.5. Not only the correct questions are the same, but also the errors are the same.
Today, based on the first wave of evidence, other netizens have uncovered new clues.
After some research, someone found that the weight difference of almost every layer of Llama3-V conforms to a Gaussian distribution with a mean of 0 and a standard deviation of 1.4e-3.
So it is speculated that Llama3-V just adds low variance noise directly to the weight of MiniCPM.
In addition, it was revealed that the big brother who ran away, Lao Mu, had written a book on "Computer Network Design" before, and was also copied.
Pick out a chapter from the book at random and use a plagiarism detector to detect a bunch of red dots:
Also, according to netizens, Siddhartha’s name is also listed in the author column of this book.
Some netizens believe that whether copying books is true remains to be investigated. However, this book is now 404.
# Talking about plagiarism this time, Siddharth and Akshay’s apology statements also mentioned the reason why they promoted it with Muge This project was initially amazed by the multi-modal model. I especially liked the architecture extension based on Idefics, SigLip and UHD described by Brother Mu.
But in fact, netizens found out early that the specific implementation of Llama3-V in many aspects such as spatial mode is different from LLaVA-UHD, but surprisingly different from MiniCPM-Llama3-V 2.5 consistent.
According to the MiniCPM-Llama3-V 2.5 homepage, MiniCPM-Llama3-V 2.5 is the latest open source model of the wall-facing intelligent MiniCPM-V series. It is built based on SigLip-400M and Llama3-8B-Instruct, with a total of 8B parameters.
In terms of performance, MiniCPM-Llama3-V 2.5 achieved an average score of 65.1 on OpenCompass, surpassing proprietary technologies such as GPT-4V-1106, Gemini Pro, Claude 3, Qwen-VL-Max, etc. model, and significantly surpasses other multi-modal language models based on Llama 3.
In addition, MiniCPM-Llama3-V 2.5 also has strong OCR capabilities, scoring 700+ on OCRBench, surpassing GPT-4o, GPT-4V-0409, Qwen- VL-Max and Gemini Pro.
Based on the latest RLAIF-V method, MiniCPM-Llama3-V 2.5 has an illusion rate of 10.3% on Object HalBench, which is also lower than the 13.6 of GPT-4V-1106 %.
Although they were quick to blame, netizens quickly discovered the highlights from the apology statements of Akshay and Siddharth's children's shoes:
You two did nothing as co-authors, but you are considered the project authors just by helping with promotion?
When it was announced, it was said to be the project of the three of you, but if something went wrong, the blame would be dumped on one person?
If Lao Mu wrote all the code alone, then what do you two do, just post?
Some netizens raised a more critical topic, further triggering heated discussions-
Whether the open source community has ignored it Big model results from China?
Lucas Beyer, Google DeepMind researcher and ViT author, mentioned that Llama3-V was copied, but the cost is less than $500, and the effect can directly catch up with Gemini and GPT-4 open source models. :
But compared to Llama3-V, MiniCPM has received much less attention, including myself.
The main reason seems to be that such a model comes from a Chinese laboratory rather than an Ivy League school.
Omar Sanseviero, head of the Huohuolian platform and community, said it more directly:
The community has been ignoring the work of China’s machine learning ecosystem . They're doing some amazing things with interesting big language models, big vision models, audio and diffusion models.
Including Qwen, Yi, DeepSeek, Yuan, WizardLM, ChatGLM, CogVLM, Baichuan, InternLM, OpenBMB, Skywork, ChatTTS, Ernie, HunyunDiT, etc.
Many netizens agreed with this, "They have launched the best open source VLM currently."
From a more objective perspective of the large model arena, this statement is true.
In the visual model arena of one-on-one PK between models, Yi-VL-Plus from Zero One Wagon ranked fifth, surpassing Google's Gemini Pro Vision. CogVLM, a collaboration between Zhipu AI and Tsinghua University, also made the top ten.
In addition, DeepSeek, Tongyi Qianwen and the MiniCPM series multi-modal models that were plagiarized this time also performed well.
In the more widely recognized LMSYS Chatbot Arena Leaderboard arena list, large models from China are also constantly setting new records for "the strongest open source".
As teacher Liu Zhiyuan said:
From a horizontal perspective, we obviously still have a significant gap with top international work such as Sora and GPT-4o; at the same time, from a vertical perspective, we have rapidly grown from nobody more than ten years ago to artificial intelligence A key promoter of smart technology innovation.
This melon is huge, there are many people eating it, and perhaps more importantly, some prejudices are breaking down. What do you think?
MiniCPM original paperhttps://arxiv.org/abs/2404.06395
The above is the detailed content of The shelling scandal makes the director of Stanford AI Lab angry! Two members of the plagiarism team took the blame and one person disappeared, and his criminal record was exposed. Netizens: Re-understand China's open source model. For more information, please follow other related articles on the PHP Chinese website!