"We are NOT currently training GPT-5. We're working on doing MORE things with GPT-4." - Sam Altman of OpenAI
TL;DR GPT-5 is not happening any time soon; OpenAI’s focus is on GPT-4 improvement. The real advances in AI will be in the AI ecosystem.
This week in a Lex Fridman interview at MIT, Sam Altman confirmed that there is no GPT-5 development right now and responds to the letter calling for a 6 month halt in AI by saying:
An earlier version of the letter claimed OpenAI is training GPT-5 right now. We are not and won't for some time. ... We are doing other things on top of GPT-4 that I think have all sorts of safety issues that are important to address and were totally left out.
So not only are they not doing GPT-5 right now, the way Sam Altman describes things, it is not even on their short-term radar.
Gary Marcus points out “the only thing that pause letter called for pausing–training GPT5–isn’t actually happening!” So, yes, the Pause idea is itself on pause. As Gary Marcus said, it wasn’t pausing all AI research, just the Big Model work, and nobody is pausing their model work because everyone not named OpenAI is behind where GPT-4 is.
The Verge discusses the GPT-5 rumors and points out the difficulty of measuring progress:
Instead, they highlight a significant challenge in the debate about AI safety: the difficulty of measuring and tracking progress. Altman may say that OpenAI is not currently training GPT-5, but that’s not a particularly meaningful statement.
Why GPT-5 Is Years Away
Some people were surprised by the comments of Sam Altman about the GPT-5 effort or non-effort, but they shouldn’t have been. There are several valid reasons why GPT-5 training is not be happening now.
The first is the compute required is an order of magnitude larger than used on GPT-4, and the Hopper-based server farms needed to do this larger task aren’t ready. They need to be built, tested and only then can runs begin.
Second, the recent highly successful releases of chatGPT and GPT-4 are sending a torrent of data in the direction of OpenAI. Just as Google learned from users using search, OpenAI can be learning from the users of their APIs. This is what Altman’s “doing things on top of GPT-4” could mean: Tweaking and fine-tuning the current GPT-4 model to improve its behavior and performance.
We have been told indirectly that GPT-4 is a trillion parameter model (OpenAI has been closed about it). What is likely is that fine-tuning based on the data they are collecting now from users will help make a more user-friendly AI. They could also figure out ways to cut inference costs or otherwise optimize GPT-4. Whether they are called GPT-4.1 or GPT-4.5 won’t matter, but there will be some kind of incremental updates.
The undertaking of a massive multi-trillion parameter model is like building a 747 or an aircraft carrier. GPT-5 cost may total hundreds of millions of dollars. OpenAI needs to catch their breath and learn from what they just released, so it makes it far more sense for them to focus on GPT-4 optimizations rather than a whole new model right now.
Once GPT-5 training does start, there will be a long lag until release. After pre-training the base LLM, there needs to be fine-tuning of the model using RLHF, safety measures, testing and Red Teaming, and more. OpenAI has been very clear that they will lean in on AI Safety and Alignment, so this may become ever more important and time-consuming.
Recall that the GPT-4 data cutoff was Sept 2021, training took place through early part of 2022 and then Microsoft engineers had testing access to GPT-4 a full 6 months before public release. Similarly, for GPT-5, the process to go from data collection, through training the LLM, then testing and release could be as much as 18 months.
GPT-3 was trained in 2019 and released in May 2020. GPT-4 was trained in 2022, released March 2023. So the GPT-3 to GPT-4 cadence was almost 3 years. In the interim, OpenAI released instruct-GPT, GPT-3.5 and chatGPT.
Sam Altman’s comments suggest a similar process and cadence for GPT-5 as happened with GPT-4. If so, we can expect interim GPT-4-based releases in the next year or two, and OpenAI will begin the GPT-5 development process next year, with a final release of GPT-5 in the next two to three years, that is sometime in 2025 or 2026.
My bottom line is I am sticking with this timeline:
GPT-5 2025
GPT-6 2027
GPT-7 2029 AGI
Making GPT-4 Improve
Where does that leave us? GPT-5 is years away, but GPT-4 will be incrementally improved in the interim. The real innovation in AI will be in treating those GPT-4-level LLMs as components and subroutines in a larger ecosystem or solution.
Here are the ways we can get closer to AGI without improving LLMs beyond GPT-4:
Running LLMs with each one focused on a sub-task, all orchestrated by a Manager AI model coordinating. A tree of GPT models working on answering a single complex prompt.
Language models equipped with dynamic and long-term memory using embeddings, to remember prior requests, prompts, actions and facts for recall. Giving language models power of semantic search.
The ability to reflect, review and revise prior LLM responses, such as in the Reflexion paper and other results we described in “Critique and Revise to Improve AI”.
Language models running locally on your machine that treat more complex language models in the cloud and other external tools like plug-ins.
Using a language model for pre-processing prompts to transform them into something that gives better responses.
Giving language models access to tools and APIs. ChatGPT has done this with plug-ins; overcoming the “bad at math” problem, giving access to current info, and extending it’s ability to run code or solve complex problems directly.
A prompt marketplace: Providing ways to improve and reuse the best prompts.
Embedding LLMs as the interface on complex software tools. “AI-enabled” everything. Voice input will be hooked up to the LLM interface, so that computers and software will become voice-command-controlled.
AutoGPT-style agents that act iteratively and autonomously on complex tasks.
There are likely other non-obvious ways language models can be improved, and there are many creative use-cases that innovators will conjure up to adapt and use LLMs. These AI ecosystems with GPT-4 level LLMs embedded in them will be powerful enough to conquer a large swath of human cognitive challenges, business problems, and more.
When Will LLM Scaling Stop?
AI keeps progressing at a fast clip, but is there a stopping point, and if so, what is it For LLMs, scaling has been based on Data, Compute, and Parameters. So if there is a stopping point, it would be either that we run out of data, or compute scaling stalls, or we end up gaining no benefit from more parameters.
Scaling LLMs on compute improves their capabilities at a log function rate. This means you will need a ten-fold increase in compute power to get a linear increase in performance. LLMs have scaled up due to enabling parallel processing; using parallel processing on GPUs, faster training is possible at colossal scale.
NVidia’s Hopper architecture and their next-gen Blackwell architecture, to be released in 2024 and built on TSMC 3nm node technology, will keep the scaling happening for several generations. However, compute scaling begins to run out of steam as transistors approach atomic scale. This may slow down in three or four generations.
Data scaling will eventually hit a wall, but not yet. Large models have already scaled to ingest several trillions of words, and there are not many orders of magnitude left of human words available. At some point we run out, and there have been a wide range of estimates of that limit being anywhere from 10 trillion words to several hundred trillion words.
Ilya Sutskever stated in a recent interview that the data situation for GPT models was good. He didn’t give specifics. It’s possible a more detailed scraping of everything ever written gives us much more data than used so far: Every research paper, news article, plus scrape the entire content of social media and convert every YouTube video and podcast into ingestible content.
If data is the new oil, there by analogy, there are still more recoverable sources, but it will get harder and harder to find more sources. Eventually we will run out.
And a final question about parameters and LLM capabilities: What if LLMs get to a certain point where they simply cannot get better? This perhaps is another way of asking: At what point do more parameters end up not improving the LLM? You add more parameters but it already is as good on the task as it could be or needs to be.
A technical way to look at this is that the core LLM is a prediction engine on the next token. The core LLM pre-training is to reduce the error of the next-token prediction. At some point, that prediction error approaches zero. Will you notice the difference between a 99.9% solution and a 99.99% solution? In an imprecise creative environment is that difference detectable?
At some point the “word-smithing” capabilities of LLMs will get so good that there is no meaningful improvement left that we could detect. What does super-human intelligence mean in regards to outputting words? More precise? More creative?
Fact-grounding and having a more complete world model, for sure, but suppose that is addressed. What else constitutes being a better writer? GPT-4 has a context window so large that it can output an entire short book in a single prompt.
As with data and compute, we are still in a strong scaling trend where more is better. More parameters yields more capabilities and we haven’t touched the ceiling yet. GPT-4 showed significant detectable improvements over chatGPT, and we can foresee the same in a future couple of generations.
When the LLM does get to that point where its so big and powerful with words that it
can’t get any better then that is what we will recognize as AGI. It can do all cognitive tasks we can imagine, and do them quickly, reliably and cheaply. You can’t get better than perfect.
This was very insightful, thanks. Only the conclusion is a bit problematic: Hitting the limit of what it can do or of text it can ingest does not mean it is AGI. Could be stuck way below that.
The plan earlier makes more sense. Having LLM do what it does best, function as a language between tools, is likely the most robust way forward, and is also programmable and understandable.