AI Week in Review 25.05.24

GitHub Copilot Coding Agent, Claude Opus 4 & Sonnet 4, Gemini 2.5 Deep Think, Gemini 2.5 Flash audio, Gemini Diffusion, Veo 3, Imagen 4, Flow, Project Jules, Mistral Devstral, Aurora predicts weather.

May 24, 2025

A person and person dancing

AI-generated content may be incorrect. — Figure 1. Veo 3 AI generated video can talk. Veo 3 generates high-quality video with sound effects and dialog.

AI Tech and Product Releases

This was an action-packed week for AI news, with major releases announced at Microsoft Build, Google I/O, and Anthropic’s Code with Claude events.

A full review of Google’s tsunami of releases from Google I/O is in our “Google I/O 25 - Google Delivers!” article, but here’s a summary:

Gemini 2.5 Pro was enhanced with better coding performance and Deep Think mode for extended reasoning and thinking budgets.
Google previewed a new Gemini 2.5 Flash that has improved reasoning and multimodal performance and adds native real-time multi-speaker audio output that can converse in 24 languages. The new Gemini 2.5 Flash achieves #2 slot on the LMArena leaderboard, behind only Gemini 2.5 Pro.
Google introduced Gemini Diffusion, a novel diffusion-based LLM that can output 5 times faster than Gemini 2.0 flash-lite at comparable performance to it, achieving 1500 tokens per second output. It’s useful for tasks like real-time editing in AI coding.
Google announced the Veo3 AI video generation model, which can generate audio, both dialog and sound effects, along with video. Veo 3 also has better physics-aware video generation and coherent character consistency across scenes.
They released Imagen 4, their latest and best image generation model, with better text rendering.
Google introduced the Flow video editing platform for filmmaking with Veo3 and Imagen 4.
Google’s Project Mariner, it’s web-browsing AI agents, has been updated and released; it is also used in AI Mode in Search and Agent Mode in Gemini.
Google launched Jules, a free asynchronous coding assistant that connects to GitHub, opens pull requests, and autonomously fixes bugs.
Google updated AI Mode in Search with features like Deep Search in AI Mode to aid research, and they rolled AI Mode out to all U.S. users.
Google launched the AI Ultra subscription that offers access to premium tools and features for $249 a month.
Google introduced Gemma 3n, a 4B open weight parameter model designed for running on mobile devices and is optimized for fast multi-modal AI, supporting interleaved text, images, audio, and video.
There was more from Google I/O beyond these highlights.

Microsoft made AI agents a big part of their Microsoft Build conference, with some key announcements to help developers build and use AI agents:

GitHub Copilot front-end code is now fully open sourced, allowing anyone to clone its VS Code integration.
GitHub launched GitHub Copilot Coding Agent, an asynchronous AI agent embedded in GitHub, enabling background pull requests and bug fixes.
Microsoft added Model Context Protocol (MCP) support directly into Windows, paving the way for native agentic AI workflows.
Microsoft introduced Copilot Tuning: “Copilot can now learn your company’s unique tone and language.”
Microsoft introduced a toolkit for building AI agents, multi-agent orchestration, and Foundry, an app platform for building apps and agents.
Microsoft announced NLWeb for easier AI integration into websites.

Anthropic introduced Claude 4 at its first “Code with Claude” developer event, releasing Claude Opus 4, its most capable AI model (72.5 % on SWE-bench verified), and Claude 4 Sonnet. Both offer hybrid thinking, with thinking budgets and extended-thinking, 200k token context window, and AI performance gains, especially in coding benchmarks. Anthropic introduced features to their API to make Claude models even better for AI agent applications, and they made Claude Code generally available.

Early reports on Claude Opus 4, including a leaked article, have highlighted both its advanced capabilities and potential safety concerns regarding misuse, such as bioweapon creation.

Mistral and AllHands jointly released Devstral 24B, a 24B parameter open-source model with a 128K context window that is optimized for coding. Devstral 24B achieves 46.8% on SWE-bench verified coding benchmark, exceeding metrics for AI models like GPT-4.1, DeepSeek V3 and Qwen 3, making it a new best-in-class open-source AI coding model for local use. It’s available via HuggingFace and can be used with coding agents like OpenHands.

Vercel released v0-1.0-md, the AI model behind their V0 web creation platform, as an API. Developers can run v0-1.0-md in other AI coding platforms to fix bugs and create tests.

Microsoft is testing a new AI-powered "Write" feature in Notepad for Windows Insiders, generating text from prompts or existing content. This Windows 11 update also enhances Paint with AI-powered sticker generation and object selection tools.

OpenAI is upgrading its AI agent, Operator, to use a new o3-based model, replacing the custom GPT-4o version. The o3 model offers significant advancements in math and reasoning tasks. This updated o3 Operator is also fine-tuned for improved safety against illicit activities and prompt injections.

OpenAI is adding new tools and features to Responses API. Taking a cue from Anthropic’s native MCP support in their API, OpenAI is also upgrading their MCP support to connect to MCP servers more easily. Code interpreter and file search can be invoked from Responses API.

AI Research News

One of Microsoft’s latest AI models can more accurately predict air quality, hurricanes, typhoons, and other weather-related phenomena. Microsoft’s Aurora AI model, trained on over one million hours of geophysical data, forecasts events with greater precision and speed than traditional systems. Aurora was presented in the paper “A foundation model for the Earth System” in Nature. This model is now integrated into the MSN Weather app.

AI Business and Policy

OpenAI announced the acquisition of LoveFrom for $6.5 billion, bringing Apple design legend Jony Ive on board to form a new division called “IO.” OpenAI aims to develop a new generation of AI-powered consumer hardware devices, moving “beyond the screen” with a mysterious, pocketable gadget expected around by 2027. It’s unclear what that device could be, but OpenAI is betting big on it.

LMArena, the leading LLM evaluation platform, secured a $100 million seed round led by Andreessen Horowitz, marking one of the largest seed financings in the AI benchmarking space. The funding aims to scale its benchmarking infrastructure and raises questions about maintaining impartiality under VC ownership.

CEOs from both Zoom and Klarna recently presented quarterly earnings using AI avatars - Klarna's Sebastian Siemiatkowski for a video, and Zoom’s Eric Yuan for initial call comments. Yuan, who also appeared live for Q&A, highlighted his company’s focus on trust and security regarding its new AI-generated avatar technology and plans for continued use. Zoom is now offering custom avatars to all users.

Enchant is launching a new zero-equity accelerator program for early-stage gaming and AI startups.

Politico union workers are filing for arbitration after alleging Politico breached its contract by publishing AI-generated content with factual errors and inappropriate language. Meanwhile, CNET released a new policy outlining limited AI use in journalism - focusing on assistance with data analysis and outlines, but requiring human-led reviews and prohibiting fully AI-written articles – following criticism of past inaccurate AI content.

Surge AI is now facing a class-action lawsuit in California alleging worker misclassification and unpaid mandatory training, mirroring similar claims against Scale AI. These suits highlight issues within the AI annotation industry regarding fair labor practices.

Khosla Ventures among VCs experimenting with AI-infused roll-ups of mature companies. Rather than solely funding startups, some VCs are acquiring mature businesses like call centers and optimizing them with AI for automation and expansion. This "private equity roll-up" strategy, employed by firms including General Catalyst, could also provide instant access to large clients for AI startups.

AI Opinions and Articles

First lady Melania Trump announced her memoir's audiobook was entirely narrated using an ElevenLabs AI clone of her own voice. With over 50,000 AI-narrated books already on Audible, the era of AI narration has arrived in publishing.

Marjorie Taylor Greene had an online argument with Grok AI on X, after Grok was prompted by others on X to critique her own comments. Greene criticized Grok as "left-leaning" and spreading "fake news" after Grok’s responses highlighting her controversial views.

Anthropic CEO Dario Amodei stated that AI models likely hallucinate less than humans, though in more surprising ways. He believes this is not a limitation on AI's path to Artificial General Intelligence (AGI), contradicting other leaders who view hallucinations as a major obstacle. Amodei maintained that steady progress is being made towards human-level AI.

“Everyone’s always looking for these hard blocks on what [AI] can do. They’re nowhere to be seen. There’s no such thing.” – Dario Amodei

AI Changes Everything

Discussion about this post