At least, that's what we're hoping. There is no specific launch date for GPT-5, and most of what we think we know comes from piecing together other information and attempting to connect the dots.
Still, no matter the due date, there are a few key features we want to see when GPT-5 launches.
GPT-5 is the highly anticipated successor to OpenAI's GPT-4 AI model, widely expected to be the most powerful generative model in the market. While there is currently no official release date for GPT-5, there are indications it could be released as early as the summer of 2024. Very little detail about the model is known at this time, but several things can be said with some amount of certainty:
OpenAI has filed a trademark for the name with the United States Patent and Trademark Office. Several OpenAI executives have discussed or hinted at the model's possible capabilities. OpenAI CEO Sam Altman repeatedly mentioned the model during a March 2024 YouTube interview with Lex Fridman.These all point to one exciting reality: GPT-5 is coming! That said, quite a lot of things are speculations at this point. But there are a few things we hope to see and are fairly confident of seeing in the model. Here are some of them:
One of the most exciting improvements to the GPT family of AI models has been multimodality. For clarity, multimodality is the ability of an AI model to process more than just text but also other types of inputs like images, audio, and video. Multimodality will be an important advancement benchmark for the GPT family of models going forward.
With GPT-4 already adept at handling image inputs and outputs, improvements covering audio and video processing are the next milestone for OpenAI, and GPT-5 is a good place to start. Google is already making serious headway with this sort of multimodality with its Gemini AI model. It would be uncharacteristic of OpenAI not to respond. But, of course, don't take our word for it. In his Unconfuse Me podcast [PDF transcript], Bill Gates asked OpenAI CEO Sam Altman what milestones he foresaw for the GPT series in the next two years. His first answer? Video Processing.
So, for GPT-5, we expect to be able to play around with videos—upload videos as prompts, create videos on the go, edit videos with text prompts, extract segments from videos, and find specific scenes from large video files. We expect to be able to do similar things to audio files. It's a big ask, yes. But given how fast AI development is, it's a very reasonable expectation.
Despite being one of the most sophisticated AI models in the market, the GPT family of AI models has one of the smallest context windows. For instance, Anthropic's Claude 3 boasts a context window of 200,000 tokens, while Google's Gemini can process a staggering 1 million tokens (128,000 for standard usage). In contrast, GPT-4 has a relatively smaller context window of 128,000 tokens, with approximately 32,000 tokens or fewer realistically available for use on interfaces like ChatGPT.
With advanced multimodality coming into the picture, an improved context window is almost inevitable. Maybe an increase by a factor of two or four would suffice, but we hope to see something like a factor of ten. This will allow GPT-5 to process much more information in a much more efficient manner. Now, a larger context window doesn't always mean better. So, rather than just increasing the context window, we'd like to see an increased efficiency of context processing.
You see, a model might have a one million token context window (around 700,000 words capacity) but fail to produce a comprehensive summary when asked to summarize a 500,000-word book because it can't adequately process the entirety of the context despite having the capacity to do so in theory. That you can read a 500k-word book does not mean you can recall everything in it or process it sensibly.
Perhaps one of the most exciting possibilities of a GPT-5 release is the debut of GPT Agents. While the term "game-changer" has probably been overused in AI, GPT agents would truly be game-changers in every practical sense. But just how game-changing would this be?
Currently, AI models like GPT-4 can help you complete a task. They can write an email, crack a joke, solve a math problem, or draft a blog post for you. However, they can only do that particular task and cannot complete a set of related tasks that would be necessary to complete your job.
Let's say you are a web developer. As part of your job, you are expected to do many things: design, write code, troubleshoot, and much more. Currently, you can only delegate a portion of these tasks to AI models at a time. Maybe you can ask the GPT-4 model to write a code for the home page, then ask it to do so for the contact page, and then for the About page, etc. You'll need to complete these tasks iteratively. And there are tasks the models simply can not complete.
This iterative process of prompting AI models for specific subtasks is time-consuming and inefficient. In this scenario, you—the web developer—are the human agent responsible for coordinating and prompting the AI models one task at a time until you complete an entire set of related tasks.
GPT Agents promises specialized expert bots coordinated by, hopefully, GPT-5 capable of self-prompting and tackling all subsets of a complex task autonomously. Emphasis on "self-prompting" and "autonomous."
So, if GPT-5 ships with GPT Agents, you could ask it to "build a portfolio website for Maxwell Timothy" rather than just "write me a code for the homepage." GPT-5 would then theoretically be able to self-prompt by invoking expert AI agents to handle the various subtasks needed to build a website. It might invoke one GPT to scrap the web for information on Maxwell Timothy, another agent to write the code for different pages, another agent to generate and optimize images, and even another AI agent to deploy the site, all without the need for repeated human prompting.
Although OpenAI has come a long way in dealing with hallucinations in its AI models, the true litmus test for GPT-5 will be its ability to address the persistent issue of hallucinations, which has held back the widespread adoption of AI in high-stakes, safety-critical domains like healthcare, aviation, and cybersecurity. These are all areas that would benefit heavily from heavy AI involvement but are currently avoiding any significant adoption.
For clarity, hallucination in this context refers to situations where the AI model generates and presents plausible-sounding but completely fabricated information with a high degree of confidence.
Imagine a scenario where GPT-4 is integrated into a diagnostic system for analyzing patient symptoms and medical reports. A hallucination could lead the AI to confidently provide an incorrect diagnosis or recommend a potentially dangerous course of treatment based on imagined facts and false logic. The consequences of such an error in the medical field could be catastrophic.
Similar reservations apply to other high-consequence fields, such as aviation, nuclear power, maritime operations, and cybersecurity. We don't expect GPT-5 to solve the hallucination problem completely, but we expect it to significantly reduce the possibility of such incidents.
As we eagerly await the official release of this highly anticipated AI model, one thing is certain: GPT-5 has the potential to redefine the boundaries of what is possible with artificial intelligence, ushering in a new era of human-machine collaboration and innovation.
The above is the detailed content of GPT-5: 4 New Features We Want to See. For more information, please follow other related articles on the PHP Chinese website!