This article is reprinted with the authorization of AI New Media Qubits (public account ID: QbitAI). Please contact the source for reprinting.
The latest posture of scientific research was unlocked by a Chinese guy——
Tell AI your research goals, then "feed" the data set into it, and you're done.
This is the latest research by Zhong Ruiqi, a doctoral student from Berkeley, and others. The tedious "evidence collection" process from massive data sets is all solved by GPT-3:
They also found that this method of using AI for scientific research is not only highly efficient, but can also produce "unexpected surprises" that humans have not thought of.
So why do the guys suddenly want to use this method to do scientific research?
This is because they found that in-depth mining of large corpora can indeed obtain some useful results, but if this process is done by humans, it is simply too time-consuming and laborious.
Therefore, they decided to hand over this tedious process to GPT-3 and named this task "D5":
Goal Driven Discovery of Distributional Differences via Language Descriptions.
Through language description, goal-driven discovery of distributed differences is achieved.
The process of the "D5" task is simply two actions:
For example, in the above case, the little brother first input two corpora into the AI:
Then determine your research goals to the AI, that is, "I want to know about drug A side effects".
After the AI receives the task, it immediately begins to perform analysis work, and finally reaches its conclusion:
The samples in corpus A will be mentioned by more patients "paranoia".
But just imagine, if human researchers are allowed to do this work, it will take a lot of time just to understand corpora A and B, not to mention further comparative analysis and other work.
The reason why the D5 mission can be done so smoothly is because the guys have done a lot of work behind the scenes.
For example, the OpenD5 metadata set is constructed, which contains 675 open questions that meet the D5 tasks, covering fields such as business, social sciences, humanities, health, and machine learning.
And each open-ended question corresponds to a corpus pair (Corpus A and Corpus B), with an average of 17,000 samples.
The little brother also uses 50% of each corpus as the research part, and the other 50% is used for verification.
Based on this, I built a "D5 system". Its working principle is similar to that of humans obtaining findings from the database. It is divided into two stages, that is, creatively putting forward a hypothesis, and then analyzing the data in the database. This hypothesis is rigorously tested on the set.
According to this idea, the researchers next conducted an experiment using GPT-3.
They first showed GPT3 the research goals and some samples from each corpus, and then asked it to come up with a list of hypotheses.
The final experiment found that GPT-3 can use target descriptions to propose more relevant, novel, and meaningful hypotheses.
It is precisely because the OpenD5 data set covers so many fields that my brother said that their D5 system has a wide range of applications.
But regarding this D5 system, I also bluntly stated its flaws.
For example, if the corpus contains a lot of slang, colloquialisms, or words with emotions, then the "discoveries" given by AI will be biased.
In short, it means that AI has misunderstood and analyzed the vocabulary or description of a specific situation.
In addition, I also said that a more flexible corpus and a more scalable system are also the focus of their future research. .
But it seems that this research has made me very excited. After all, it is one step closer to his dream of "building a scientific research using AI manuscripts."
Reference link:
[1] //m.sbmmt.com/link/ b1adda14824f50ef24ff1c05bb66faf3
[2]//m.sbmmt.com/link/ec26fc2eb2b75aece19c70392dc744c2
The above is the detailed content of A new approach to scientific research: let GPT-3 help you. For more information, please follow other related articles on the PHP Chinese website!