AI Week in Review 25.08.09

GPT-5, Claude Opus 4.1, OpenAI’s open AI models – gpt-oss-120b and gpt-oss-20b, Qwen-Image, Cursor CLI, ElevenMusic, Copilot 3D, Producer.AI, KittenTTS, Cohere Command A Vision.

Aug 09, 2025

Two people in space suits

AI-generated content may be incorrect. — Figure 1. Qwen-Image generation celebrating this weeks AI model releases in a poster. Qwen-Image can generate text faithfully and generate poster-like images of good quality. This took 5 tries for 100% correct text though; I’ll give it an 8 out of 10.

Top Tool: GPT-5

In the biggest AI release of the week and perhaps the year, OpenAI introduced GPT-5 as its new default model for ChatGPT, describing it as “our smartest, fastest, most useful model yet.” OpenAI’s announcement presentation highlighted state-of-the-art results across coding, math, multimodal understanding, and health, and they demonstrated its AI coding ability, showing how GPT-5 can vibe-code complex applications.

GPT-5 in ChatGPT is a unified system built on the state-of-the-art GPT-5 reasoning model, with an internal router that automatically decides when to think longer to balance speed and depth of thought.

For developers, OpenAI released three GPT-5 model variants to the API: GPT-5, GPT-5 mini, and GPT-5 nano. These models are all priced very competitively for their performance, on the frontier of best price versus performance.

OpenAI framed GPT-5’s enterprise push in a companion post, arguing it ushers in a “new era of work” by improving automation and workforce productivity. Our own take, as shared in GPT-5 Arrives - A Model for Agentic AI, is that GPT-5 embodies a fundamental shift toward agentic AI. GPT-5’s ability to combine tools and reasoning, as well as its superior AI coding ability, position it as a great AI model for agentic AI.

AI Tech and Product Releases

Anthropic released Claude Opus 4.1, an upgrade to Opus 4 focused on agentic tasks, real-world coding, and reasoning. Claude Opus 4.1 incrementally improves on Opus 4 reasoning benchmarks, scoring 74.5% on SWE-bench verified, up 2 points versus Opus 4 and on par with GPT-5. It is available to paid Claude users, in Claude Code and other AI coding assistants, and via an API, with pricing unchanged from Opus 4.

OpenAI released two new open-source AI models, gpt-oss-120b and gpt-oss-20b, their first open AI models since GPT-2. Open sourced under the Apache 2.0 license, these gpt-oss models are highly efficient mixture-of-experts (MoE) models, allow for private and local deployment. The benchmark evaluations indicate strong performance, especially in math and coding, but there are indications these models were trained to benchmarks and community feedback indicates limitations in common sense and creative reasoning. They are widely available on HuggingFace, Ollama, and many APIs, and are a significant contribution to open-source AI.

We have more on Claude Opus 4.1 and OpenAI's gpt-oss models here.

Alibaba's Qwen Team released Qwen-Image, a new open-source AI image generator boasting highly accurate text rendering and precise instruction-following for image generation and image editing. Despite some initial mixed test results, Qwen-Image ranks as a top open-source model, offering enterprises a flexible tool for content creation that supports various scripts and complex layouts. Licensed under Apache 2.0, it is free for commercial use and available via HuggingFace and for use on QwenChat.

A shelf with books on it

AI-generated content may be incorrect. — Figure 2. Qwen-image faithfully generates text titles for a number of books, while rendering a fairly realistic image overall, yet with a few AI artifacts in the fine print on books.

ElevenLabs released a Text-to-music model called ElevenMusic, joining the music generation bandwagon. ElevenLabs says it is cleared for nearly all commercial uses. It’s available on the ElevenLabs website, with API access for developers coming soon.

Recently announced Producer.ai from the formerly named Riffusion is the “latest dimension of AI song creation,” an AI music agent for interactive song creation. How it works:

The Producer helps realize your artistic vision, working with you to write, edit, and remix original, studio-quality songs using our frontier models, and learning alongside you all the way.

Cursor dropped their own CLI version of Cursor, bringing a competitor to Claude Code and Gemini CLI to the AI coding assistant arena.

Cohere released Command A Vision, a new 112B parameter visual model for enterprise use cases. This model reportedly outperforms competitors and excels at interpreting charts, diagrams, scanned documents, and images through OCR and analysis.

Cohere also announced general availability of North, an “agentic AI platform that securely accesses all of the data you use in your work” by enterprise-hosting AI agent applications.

Microsoft rolled out Copilot 3D, a free image-to-3D tool that converts pictures to 3D renders in GLB format for viewers, design tools, and game engines. Early testing shows limitations on humans and animals. The feature is accessible via Microsoft’s Copilot Labs.

Google announced an AI update for Google Finance, its financial information and business news tool. Users can now research financial questions with AI, access advanced charting, and get real-time data and news, including commodities and cryptocurrencies.

For developers:

Anthropic added “Search result content blocks” as a generally available feature in its API and on Vertex AI, enabling native citations for RAG with source attribution.
AWS made its Automated Reasoning Checks on Bedrock generally available to boost enterprise confidence in deploying AI, especially in regulated industries. This feature uses math-based validation to detect and prevent model hallucinations.
A tiny 15million parameter TTS model called KittenTTS released, and it’s a decent voice generation AI model that can run on a CPU. This makes it possible to have voice interfaces run on minimal edge devices.

Anthropic launched automated security review capabilities for its Claude Code platform, introducing tools that scan code for vulnerabilities and suggest fixes. The AI-powered solution integrates directly into developer workflows via terminal commands and GitHub reviews, addressing concerns about AI-accelerated software development is outpacing code security.

AI Research News

Google Research achieved a 10,000x reduction in training data requirements for fine-tuning with an active-learning curation method, while maintaining or improving quality. The team reports up to 65% higher model–human alignment on 500 examples of curated data versus 100k-example crowdsourced baselines. The research shows the power of carefully curated data for fine-tuning AI models.

A new Anthropic study introduces "persona vectors" to identify, monitor, and control character traits in LLMs. These vectors, directions in a model's internal space, correspond to personality traits, helping manage undesirable behaviors. Published in the paper “Persona Vectors: Monitoring and Controlling Character Traits in Language Models,” the technique enables proactive training data screening and direct "steering" for more stable, predictable AI.

AI Business and Policy

Tesla is disbanding its in-house Dojo supercomputer team, ending its proprietary AI chip and supercomputer development for driverless technology. This shift follows key departures to a AI data center startup DestinyAI. Dojo's lead is leaving, and remaining members are reassigned, signaling increased reliance on Nvidia, as well as leveraging Samsung-produced AI6 chips.

AI coding startups like Windsurf face "very negative" gross margins due to the high cost of utilizing AI coding models. Popular services like Cursor are adjusting pricing to cope with these significant industry challenges, while others are This economic pressure and fierce competition often compel companies to sell or consider building their own expensive AI models.

Meta has acquired AI voice startup WaveForms for an undisclosed sum. This is Meta's latest acquisition to strengthen its Superintelligence Labs, following their recent purchase of PlayAI.

Duolingo announced it beat quarterly revenue estimates, with stock rising almost 30%, despite backlash for embracing generative AI and phasing out human contractors. The company is now "AI-first," doubling offerings with 148 new language courses. Duolingo anticipates over $1 billion in revenue this year, with active users up 40%.

Elon Musk announced that X will introduce ads within Grok AI responses. Advertisers can pay to have their solutions appear as suggestions when users query Grok. This strategy aims to improve ad targeting and boost the X platform's ad revenue to fund operations.

Perplexity is powering Truth Social's new "Truth Search AI" search engine. This AI search offers cited answers, but Truth Social controls the sources, frequently citing outlets like FoxNews.com.

Google denies AI search features are killing publisher traffic, asserting organic click volume is "relatively stable" year-over-year with improved quality. It attributes traffic shifts to users seeking authentic content on forums and social media, rather than AI.

Anthropic said U.S. federal agencies can now purchase Claude via the GSA schedule, streamlining access with pre-negotiated terms that meet federal procurement rules. The company frames this as an expansion of public-sector availability following recent national lab and defense-workflow partnerships.

AI Opinions and Articles

"It [GPT-5] feels like I'm working with a coworker, a very hardworking coworker." – Theo

There have been a range of reactions to GPT-5’s release. In a most glowing review, Theo says, “I didn’t know it could get this good. This is a really good model.”

The most helpful have been eyes-wide-open assessments ("a bit underwhelming") that put GPT-5 to the test and acknowledge both GPT-5’s strengths and issues. Some reviewers are finding Opus 4.1 is slightly better but much pricier than GPT-5 on coding, while GPT-5 is better than Sonnet 4.

The Economist on GPT-5 says:

GPT-5 is not the mind-blowing leap that some were hoping for. But a few more years of steady progress like this could yield AI systems of transformative power.

The Economist is making the same point I made in my GPT-5 article. Yes, GPT-5 itself may be just incrementally better than predecessors, but it’s still state-of-the-art, and the rapid clip of AI model release iterations adds up to continued rapid AI improvement. The AI revolution continues.

GPT-5 is an evolution toward agentic AI but goes further than any previous released model. Its better reasoning, tool use, pricing, interface, and improved coding capabilities combine to fundamentally shift how far you can take AI, especially in agentic AI use cases. – AI Changes Everything

Operactive Arts

Aug 9

Hi Patrick, I was wondering if you would be interested in participating in our research about the future of AI in Creative Industries? Would be really keen to hear your perspectives. It only takes 10mins and I am sure you will find it interesting.

https://form.typeform.com/to/EZlPfCGm

Expand full comment

2 replies by Patrick McGuinness and others

2 more comments...

AI Changes Everything