Large model ceiling GPT-4, has it...become stupid?
First a few users raised questions, and then a large number of netizens said they had noticed it and posted a lot of evidence.
Some people reported that they used up the 3 hours and 25 dialogue quotas of GPT-4 in one go, and still did not solve their own code problems.
I had no choice but to switch to GPT-3.5, but it solved the problem.
To summarize everyone’s feedback, the most important manifestations are:
This has caused a lot of people I wonder if OpenAI is cutting corners to save costs?
Two months ago GPT-4 was the world's greatest writing assistant, and a few weeks ago it started to fall into mediocrity. I suspect they cut back on the computing power or made it less intelligent.
This inevitably reminds people of Microsoft's new Bing, which "reached its peak when it debuted", but later suffered "frontal lobotomy surgery" to change its ability. Bad things...
After netizens shared their experiences with each other, it became everyone's consensus that "it started to get worse a few weeks ago."
A storm of public opinion also formed in technical communities such as Hacker News, Reddit and Twitter.
Now the officials can’t sit still.
OpenAI Developer Promotion Ambassador Logan Kilpatrick responded to a netizen’s question:
The API will not change without us notifying you. The model there is at rest.
Worried netizens continued to ask for confirmation, "That means GPT-4 has been static since it was released on March 14, right?" ?", also received a positive answer from Logan.
"I noticed inconsistent performance for some prompt words, is it just due to the instability of the large model itself?", also got " Yes" reply.
But so far, the two questions about whether the web version of GPT-4 has been downgraded have not been answered, and Logan has not received any answers during this period. There is other content posted.
So what exactly is going on? Why not try it yourself.
As netizens generally mentioned that GPT-4’s coding skills have become worse, we conducted a simple experiment.
At the end of March, we experimented with letting GPT-4 "make elixirs" and write a multi-layer perceptron in Python to implement an XOR gate.
△ShareGPT screenshot, the interface is slightly different
After changing GPT-4 to use numpy without a framework, the first time The result is wrong.
After modifying the code twice, the correct result was obtained. The first time is to modify the number of hidden neurons, and the second time is to change the activation function from sigmoid to tanh.
On June 2, we tried again to let GPT-4 complete this task, but changed to Chinese prompt words.
This time GPT-4 did not use the framework for the first time, but the code given was still wrong.
After only one modification, the correct result was obtained, and the idea was changed to the idea of directly increasing the number of training epochs and learning rate.
# The speed does feel faster. Due to limited time, we only conducted this experiment, and due to the randomness of AI itself, we cannot deny the observations of netizens. Some people reported feedback as early as April 19thWe searched in the OpenAI official Discord channel and found that starting from late April, sporadic users reported that GPT-4 had become worse.
However, these feedbacks did not trigger large-scale discussions and did not receive an official official response. On May 31, Hacker News and Twitter began to have a large number of netizens discuss this issue on the same day, becoming a key node in the entire incident. HackerNews A netizen pointed out that the GPT-4 avatar was stronger when it was black, but now the purple avatar version will lose a few lines when modifying the code.The person who raised this issue earlier on Twitter was Matt Shumer, CEO of HyperWrite (a writing tool developed based on GPT API).
######But this tweet resonated with many netizens, and OpenAI employees responded to this tweet. ######However, these responses did not satisfy everyone. Instead, the scope of the discussion became wider and wider. ######For example, a post on Reddit mentioned that GPT-4, which was originally able to answer code questions, now can’t even tell which ones are code and which ones are questions. #####################After being questioned by other netizens, the author of the post gave an overview of the process of the problem and also attached the chat record with GPT. ##################### Regarding OpenAI’s claim that the model has not been changed since March, there is indeed no relevant record at the public level. ######In the update log of ChatGPT, updates to the model itself were mentioned on January 9, January 30, and February 13 respectively, involving improvements in factual accuracy and mathematical capabilities. ######But since the release of GPT-4 on March 14, there has been no mention of model updates. There are only changes in web APP function adjustments and the addition of networking mode, plug-in mode, Apple APP, etc. ##################### Assuming that, as OpenAI said, the capabilities of the GPT-4 model itself have not changed, then why do so many people feel that its performance has deteriorated? What's going on? ######Many people also gave their own guesses. ######The first possible reason is psychological. ###### François Chollet, founder of Keras, said that it is not that the performance of GPT has deteriorated, but that everyone has passed the initial surprise period and their expectations for it have become higher. ##################### Some netizens on Hacker News also held the same view and added that people’s focus has changed and they are more sensitive to GPT mistakes. . ###Putting aside the differences in people’s psychological feelings, some people also suspect that the API version and the web version are not necessarily consistent, but there is no solid evidence.
Another guess is that when the plug-in is enabled, the extra prompt words of the plug-in may be considered a kind of pollution to the problem to be solved.
△Additional prompt words in the WebPilot plug-in
This netizen said that in his opinion, the performance of GPT has deteriorated. It started after the plug-in function started public testing.
Some people also asked OpenAI employees whether the model itself has not changed, but whether the inference parameters have changed?
Qubits also accidentally "tortured" that the system prompt words of ChatGPT on iOS were not consistent with the web version.
△It may not be successful, and there is a high probability of refusing to answer
Then if you continue in the web version, open it in the iOS version dialogue without realizing it, you may observe that GPT-4 answers become simpler.
In short, it is still an unsolved mystery whether GPT-4 has become dumber since its release.
But one thing is certain:
The GPT-4 that everyone started playing on March 14th was not as good as the one in the paper from the beginning.
The more than 150-page paper published by Microsoft Research "The Spark of AGI: Early Experiments with GPT-4" clearly states:
They obtained testing qualifications before the development of GPT-4 was completed and conducted long-term testing.
Later, for many amazing examples in the paper, netizens were unable to successfully reproduce them using the public version of GPT-4.
There is currently a view in the academic community that although the subsequent RLHF training made GPT-4 more aligned with humans - that is, more obedient to human instructions and consistent with human values - it also allowed it to use its own reasoning, etc. Ability becomes worse.
One of the authors of the paper, Microsoft scientist Zhang Yi, also mentioned in the S7E11 issue of the Chinese podcast program "What's Next|Technology Knows Early":
That version of the model is better than the current one. GPT-4, which is available to everyone, is even stronger, much stronger.
For example, the Microsoft team mentioned in the paper that they let GPT-4 use TikZ in LaTeX to draw a unicorn at regular intervals to track changes in GPT-4 capabilities. .
The last result shown in the paper is quite complete.
But the first author of the paper, Sebastien Bubeck, later revealed more information when he gave a speech at MIT.
Later, when OpenAI began to focus on security issues, subsequent versions became increasingly worse at this task.
Training methods that are aligned with humans but do not reduce the upper limit of AI's own capabilities have become the research direction of many teams now, but Still in its infancy.
In addition to professional research teams, netizens who care about AI are also using their own methods to track changes in AI capabilities.
Someone asked GPT-4 to draw a unicorn once a day and record it publicly on the website.
Since April 12, I still haven’t seen the general shape of a unicorn.
Of course, the website author said that he let GPT-4 use SVG format to draw pictures, which is different from the TikZ format in the paper and has an impact.
And what I drew in April seems to be just as bad as what I draw now, and there is no obvious regression.
Finally, let me ask you, are you a GPT-4 user? Have you felt that GPT-4's capabilities have declined in recent weeks? Welcome to chat in the comment area.
Bubeck’s speech: //m.sbmmt.com/link/a8a5d22acb383aae55937a6936e120b0
Zhang Yi’s interview: //m.sbmmt.com/link/ 764f9642ebf04622c53ebc366a68c0a7
One GPT-4 unicorn every day//m.sbmmt.com/link/7610db9e380ba9775b3c215346184a87
Reference link:
[1]//m.sbmmt.com/link/cd3e48b4bce1f295bd8ed1eb90eb0d85
[2]//m.sbmmt.com/link/fc2dc7d20994a777cfd5e6de734fe254
[3]//m.sbmmt.com/link/4dcfbc057e2ae8589f9bbd98b591c50a
[4]//m.sbmmt.com/link/0007cda84fafdcf42f96c4f4adb7f8ce
[5]//m.sbmmt.com/link/cd163419a5f4df0ba7e252841f95fcc1
[6]//m.sbmmt.com/link/afb0b97df87090596ae7c503f60bb23f
[7]//m.sbmmt.com/link/ef8f94395be9fd78b7d0aecf7864a03
[8]//m.sbmmt.com/link/30082754836bf11b2c31a0fd3cb4b091
[9]//m.sbmmt.com/link/14553eed6ae802daf3f8e8c10b1961f0
##
The above is the detailed content of GPT-4 becomes stupid and triggers public opinion! The quality of text code has declined, and OpenAI has just responded to questions about cost reduction and material reduction.. For more information, please follow other related articles on the PHP Chinese website!