GPT-5 Arrives - A Model for Agentic AI

GPT-5 is the world's newest smartest model, offering more intelligence, incremental improvements across the board, a big jump in AI coding abilities, and a fundamental shift toward agentic AI.

Aug 08, 2025

Figure 1. AI art from GPT-5, which imagined, designed and then drew this image, based on this prompt: “Image, design and draw an image that invokes the release of GPT-5 the "smartest, fastest, most useful AI yet". Convey a feeling of newness, like the dawn of a new day, plus the possibility of expanding intelligence. Use your imagination.”

OpenAI Presents GPT-5

Our smartest, fastest, most useful model yet, with built-in thinking that puts expert-level intelligence in everyone’s hands. - OpenAI

On August 7, OpenAI released the long-awaited GPT-5, sharing the features and quality of GPT-5 in a lengthy announcement presentation. OpenAI presented GPT-5 as their most intuitive, intelligent, and capable AI ever, saying "We think you will love using GPT-5 much more than any previous AI.”

What is GPT-5? It’s not just a new AI model. GPT-5 in ChatGPT is a unified AI system that combines their GPT-5 thinking model with a real-time router that decides what model and tools to use based on each query. It will use GPT‑5 thinking for harder problems, search and other tools as needs, and have a “quick response” for easier problems.

A screenshot of a computer

AI-generated content may be incorrect. — Figure 2. OpenAI presented how GPT-5 in one prompt created a Bernoulli Effect explainer application.

OpenAI presentation was at times low-key, but they gave many compelling demonstrations of various GPT-5 capabilities:

Coding for Learning: They used GPT-5 to ‘vibe-code’ an interactive SVG animation to explain the Bernoulli effect, a task that would have taken a human developer a week.
“GPT-5 is our most creative and best writing model to date.” They used GPT-5 to generate an emotive and nuanced eulogy for previous AI models, showing how GPT-5 interactions can feel “less like AI and more like you're chatting with your high IQ and EQ friend.”
Quick App Coding: In another example of vibe-coding, GPT-5 built, from a single prompt, a fully functional web application for learning French, complete with flashcards and quizzes.
Voice: Their voice model is now more adaptive, able to change its speaking style and speed on command.
Enhanced Memory and Integration: GPT-5 can connect to your Gmail and Google Calendar to help with scheduling and task management, acting as a true "thought partner."
Safety and Reliability: They covered how GPT-5 has greatly reduced hallucinations and is designed to be less deceptive. It will provide "safe completions" that offer helpful alternatives even when it can't directly fulfill a request, a better way to handle unsafe requests than blanket refusals.
Health: OpenAI mentioned health applications for GPT-5, being used to help patients understand complex medical information and make informed decisions.

They announced immediate availability of GPT-5 to all user tiers, including the free tier.

OpenAI Targets AI Coding

“We believe GPT-5 is the best coding model in the world.” – Greg Brockman

A computer screen shot of a castle

AI-generated content may be incorrect. — Figure 3. OpenAI showed that GPT-5 can generate code in several demonstrations.

The biggest ‘tell’ from OpenAI was how much they mentioned GPT-5 is good at coding. While only briefly mentioning many other business applications, they shared many vibe-coding demonstrations. As if to emphasize the point, they brought the Cursor CEO on talk about GPT-5 as a coding model. He praised it highly:

“It’s a very smart model, but it doesn’t compromise on ease of use for pair programming. It’s fast and interactive. … Tells you the plan, giving you updates. Leaves a reasoning trace that you can follow. … It’s code base understanding and ability to be steered is outstanding.”

He then walked through using GPT-5 to quickly identify and fix a bug in their codebase.

Given how Cursor is quite possibly the largest downstream consumer of Anthropic API calls, feeding significant revenue to Anthropic, there are potentially huge consequences to the AI competitive landscape with this release.

OpenAI tipped their hat here. Their pitch: GPT-5 thinking is the best AI for coding and costs no more than Gemini 2.5 Pro. OpenAI is actively competing for the vibe-coding and AI coding assistant business, and GPT-5’s quality and pricing now beats Claude.

GPT-5 Features and Benchmarks

“GPT-3 was sort of like talking to a high school student... with GPT-4 maybe it was like talking to a college student... but with GPT-5 now it's like talking to an expert, a legitimate PhD level expert in anything, any area you need on demand.” - Sam Altman

GPT-5 demonstrates state-of-the-art performance on a range of academic and industry benchmarks, with particular strength in math and coding:

Coding: 74.9% on SWE-bench Verified, 88% on Aider Polyglot. These are SOTA and marginally better than Claude 4.1 Opus. It also got WebDev Arena score of 1480, showing its strength on web app coding. On SWE-Lancer, GPT-5 gets 55% vs 54% for o3.
LMArena ELO score of 1481, beating out Gemini 2.5 Pro to be number 1 across the board on chat interactions.
Multimodal reasoning: 84.2% on MMMU, excelling in understanding and interpreting visual images and charts.
Academic: 85.7% on GPQA, and 24.8% on Humanity’s Last Exam (HLE) with no tools, but GPT-5Pro with tools gets 42% on HLE. GPQA now beyond human-level.
Math: 94.6% on AIME 2025 without tools, demonstrating superior mathematical reasoning abilities and saturating AIME benchmark.
Agentic AI and tool use: GPT‑5 shows strong instruction following and tool use, with 81% on Tau2-Bench retail and 54% on BrowseComp; both are improved over o3.
Intelligence: GPT‑5 hits 65.7% on ARC‑AGI‑1 and 9.9% on ARC‑AGI‑2; Grok‑4 leads ARC‑AGI‑2 at 15.9%.

A graph of different colored bars

AI-generated content may be incorrect. — Figure 5. Tau-bench benchmarks will become more important as AI agent use becomes more prevalent. GPT-5 improves over o3 on these agentic tasks.

Beyond the traditional benchmarks, OpenAI also touted improved performance on health-related benchmarks.

OpenAI put emphasis on improving factuality and reliability, especially on open-ended or complex questions, and they claimed that GPT-5 is "by far our most reliable, most factual model ever."

The Artificial Analysis intelligence ranking sums up the benchmarks overall, showing GPT-5 lives up to billing as the current world’s smartest AI model, slightly ahead of Grok 4, o3, and Gemini 2.5 Pro.

GPT-5 Reviews and Vibes

In the day since the release announcement, there have been many reviews of GPT-5, some by influencers given early access to GPT-5. Vibes and external reviews overall confirm OpenAI’s claims and benchmarks.

OpenAI claimed they “trained GPT-5 to be much more agentic.” It seems to have paid off. Ben Hylak on the Latent Space blog says the key to GPT-5 is its powerful tool usage, marking the “Stone Age” for AI:

GPT-5 marks the beginning of the stone age for Agents and LLMs. GPT-5 doesn’t just use tools. It thinks with them. It builds with them.

He points out that you need to think of prompting GPT-5 like an agent, because it’s not just an LLM anymore. He also confirms that it is great at coding:

I think GPT-5 is unequivocally the best coding model in the world. We were probably around 65% of the way through automating software engineering, and now we might be around 72%. To me, it’s the biggest leap since 3.5 Sonnet.

Matt Berman found GPT-5 passed his coding tests, including a crazy 20x20 Rubik’s cube GUI application that GPT-5 was able to solve. I’m not sure how it does it. Influencer Wes Roth says GPT-5 just one-shot the world.

Ethan Mollick says “It just does stuff” in another take on GPT-5’s agentic features:

When you ask GPT-5 for something, the AI decides which model to use and how much effort to put into “thinking.” It just does it for you. For most people, this automation will be helpful, and the results might even be shocking, because, having only used default older models, they will get to see what a Reasoner can accomplish on hard problems.

Ethan Mollick mentions the router is taking away some user control over what AI model to use, which may annoy some power users of GPT-5:

But for people who use AI more seriously, there is an issue: GPT-5 is somewhat arbitrary about deciding what a hard problem is.

OpenAI went from one extreme, a model picker with almost a dozen models, to the other extreme, hiding the model selection behind an AI router. To force “hard thinking” you have to adjust your prompt or add a flag, which doesn’t seem user-friendly.

Tried it yet? What’s your thoughts on GPT-5? Leave a comment.

Features and Pricing

For consumers and business end-users, OpenAI offers GPT-5 in ChatGPT. While GPT-5 is available on all tiers, there’s a GPT-5 Pro only available for Pro users.

For developers, GPT-5 comes via their API, and GPT-5 models in the API come in 3 flavors: GPT-5, GPT-5 mini, GPT-5 nano. GPT-5 models have increased output token length to a surprisingly long 128k max output tokens for extended outputs. All three models have the same features: they are reasoning models with text and vision input and output, settings for verbosity and reasoning, and support for built-in tools.

The GPT-5 API models are priced very competitively; GPT-5 pricing is on par with Gemini 2.5 Pro and a fraction of Claude Opus pricing:

GPT-5: $1.25/ $10.00 per 1M tokens input/output.
GPT-5 mini: $0.25 / $2.00 per 1M tokens input/output.
GPT-5 nano: $0.05 / $0.40 per 1M tokens input/output.

Conclusion – The GPT-5 Evolution

GPT-5 is not the huge leap forward people long expected … GPT-5 is obviously not AGI. - Gary Marcus

Gary Marcus is right: It’s not an exponential leap, nor is it AGI. This GPT-5 release is an evolution, not a revolution.

While GPT-5 is a huge jump over 2023 GPT-4, all the major advances in GPT-5 over GPT-4 - multimodality, reasoning, more context window size, memory, and tool use – have been incrementally introduced since then in prior AI models. GPT-5 is marginally better in o3 in reasoning and tool-calling, but the direction of the evolution is telling.

GPT-5 in ChatGPT is an agent - with tool use, model routing, and customization. OpenAI is leaning in on AI for coding, the number one agentic AI use case, to gain market share.

GPT-5’s writing improvements over GPT-4.5 are hard to discern; taste is subjective and AI writing is already quite good. Simpler use-cases and benchmarks are already saturated. For more noticeable is how GPT-5 shines as a far more powerful agentic AI model.

I agree with Ben Hylak. GPT-5 marks a new “Stone Age” tool-using era for AI models and Agents.

As we have been making the case with other AI model releases, GPT-5 is an evolution toward agentic AI, but goes further than any previous released model. Its better reasoning, tool use, pricing, interface, and improved coding capabilities combine to fundamentally shift how far you can take AI, especially in agentic AI use cases.

If so, then the real revolution will come with those using GPT-5 to build more powerful AI agents that can do more for us than ever before.

A diagram of a graph

AI-generated content may be incorrect. — Figure 7. A reminder of how far we have come with AI in the last year. GPT-4o was the best AI model in Aug 2024, a non-reasoning model that would rank 50th among AI models today.

Aug 9

Apparently, the annoyance and loss-of-trust from removing access to older AI models led to enough of a backlash that OpenAI is restoring access to GPT-4o. This was a user experience error on OpenAI’s part, avoidable by simply having a long deprecation period and not rug-pulling like they did, and by having manual control overrides for the AI model selector. GPT-4o is back. https://x.com/Alex_ADEdge/status/1954160694716227720

Expand full comment

AI Changes Everything

Discussion about this post