Imagine you are at a vibrant cocktail party filled with lively conversations and the clink of glasses.
At this time, you are a leisurely observer, hiding in the corner happily. Yet even without being at the center of a party, you can easily figure out the social relationships between different people, understand what's going on, and even decipher overt and covert social messages by reading people's verbal and nonverbal cues.
What if an LLM could reproduce this level of social skills? No, that’s what Koko Mind is.
Just open a video, and the model will start to analyze the character's expression and draw conclusions about the character's emotion.
Then, you can also ask questions in the prompt column on the right to let AI further analyze the undercurrent of social puzzles in the video.
(To be honest, this is difficult for some people)
Picture
Koko Mind contains 150 complex multi-party social interactions and free text questions and answers.
To ensure data diversity and scalability and avoid data contamination, all social interactions, questions and answers are generated by GPT-4 and subsequently verified by human experts.
The analysis data is based on three different sources:
The proportions of the three data sources are as follows:
Pictures
For each social interaction, researchers will ask various questions to explore the following aspects closely related to social understanding.
The researchers used text-davinci-003 as a reference to evaluate different models after AlpacaEval.
In which the researchers removed the nonverbal cues in the brackets (e.g., nervously drinking coffee, etc.) from the context.
The following are some interesting points:
(One possible explanation is that GPT-4 is a multi-modal model that can better understand additional non-verbal information.)
In the blog, the researchers drew tables to clearly see the performance of each model.
Picture
The results, while exciting in many ways, also have certain limitations. First, Koko Mind is relatively small, which may limit the broad applicability and comprehensiveness of the researchers' conclusions.
Secondly, all interactions in Koko Mind are generated by GPT-4 and require manual verification, which makes the dataset difficult to expand.
Also, although Koko Mind provided human-verified answers in the dataset, the researchers did not use these answers as a reference when evaluating, and since these answers were generated by GPT-4 , so they may be biased towards GPT-4.
Future research could focus on how to evaluate models on human-validated machine-generated reference answers.
Of course, despite the limitations of one kind or another, researchers still regard Koko Mind as a springboard for future research related to social intelligence, multi-modal language models, etc.
The above is the detailed content of 'Social Master' GPT-4! Know how to interpret expressions and speculate on psychology. For more information, please follow other related articles on the PHP Chinese website!