AI Week in Review 25.05.31

DeepSeek R1-0528 and R1-0528-Qwen3-8B, Perplexity Labs, Mistral Agent API, Codestral Embed, Flux.1 Kontext, Google Stitch, ElevenLabs Conversational AI 2.0, Chatterbox TTS, Unmute.sh, QwenLong-L1.

Jun 01, 2025

Figure 1. Flux.1 Kontext can change image character and context via prompt command

Top Tools

We were expecting R2, but instead DeepSeek released R1-0528, an updated version of its R1 reasoning model, with enhanced capabilities in math, coding, and reasoning as if it was named R2. R1-O528 is challenging o3 and Gemini 2.5 Pro on performance, with state-of-the-art benchmark results on AIME 24 (91.4), LiveCodeBench (73.3), GPQA Diamond (81), and SWE Verified (57.6). This easily makes it the most powerful open source AI reasoning model.

A graph of different colored bars

AI-generated content may be incorrect. — Figure 2. DeepSeek R1-0528 improved upon original R1 and competes with OpenAI o3 and Gemini 2.5 Pro on math, coding, and reasoning.

However, testing also indicates DeepSeek’s R1-0528 model is significantly more censored, particularly on topics critical of the Chinese government, adhering to strict Chinese regulations.

Alongside the full R1-0528 model, DeepSeek introduced a smaller, distilled version named DeepSeek-R1-0528-Qwen3-8B, built using the Qwen3-8B model as a base. It gets 76.3% on AIME 2025, better than Google’s Gemini 2.5 Flash, and nearly matches it and Phi 4 reasoning plus model on another math skills test, HMMT.

AI Tech and Product Releases

Perplexity released Perplexity Labs, an agentic AI tool that crafts reports, spreadsheets, and dashboards to perform customized project tasks. It conducts research and data analysis, generates files, and builds interactive apps. This broadens Perplexity's offerings beyond search and “Deep Research” report generation and is available for Pro subscribers. You can start from different customized Perplexity Labs tasks in their Projects Gallery.

Mistral launched Agents API, Mistral’s API framework for building AI agents and applications with Mistral AI models. Mistral’s API includes built-in tool support, agentic orchestration, and memory support for maintaining context. It also has native MCP support for custom tool integration. As we shared in “AI Agent Ecosystem Expands - Mistral's Agents API,” these are needed for building agentic AI applications on their models, and follows similar API updates from Anthropic and OpenAI.

Mistral launched Codestral Embed, its first embedding model specializing in code for RAG use cases. Mistral claims Codestral Embed outperforms existing models on benchmarks for code search and retrieval and is available to developers for $0.15 per million tokens.

Black Forest Labs launched FLUX.1 Kontext, a suite of AI models for in-context image generation and editing that utilizes both text and images as input for further in-context creation. FLUX.1 Kontext comes in [max] for maximum performance, [pro] version for fast, iterative multi-turn image refinement, and [dev], an open weights lightweight 12B diffusion transformer suitable for customization. The models outperform OpenAI’s gpt-image-1 on benchmarks, with speeds up to 8x faster. Kontext is available on image generation platforms and through their own BFL Playground.

A collage of stuffed animals

AI-generated content may be incorrect. — Figure 3. Demonstration of FLUX.1 Kontext updating images based on interactive prompts.

ElevenLabs launched Conversational AI 2.0, a significant upgrade for their voice agent platform. It offers key improvements: natural turn-taking, multilingual support, multimodality (text and audio interleaved), and a built-in RAG system. This enables intelligent, secure interactions for customer support, sales, and marketing, supporting enterprise use cases for their conversational AI.

Resemble AI has released Chatterbox, a “production-grade” open source TTS that includes high quality voice cloning with emotion control. They claim it outperforms ElevenLabs on benchmarks. You can get it on GitHub and try it on HuggingFace spaces.

Anthropic is rolling out Voice Mode for Claude, supporting conversational AI on the Claude mobile app.

KyutAI has introduced Unmute.sh, a modular AI wrapper that adds voice to any text LLM. It includes semantic voice activity detection (VAD) and has low latency (under 300ms). It can be accessed via unmute.sh website, and KyutAI, the team behind the Moshi audio model, plan to open source the model in a few weeks.

Google Labs has released Stitch, a new AI-powered UI design tool now in beta that can instantly create web designs and user interfaces from prompts. By automating web and app design tasks with AI, Google Stitch competes in the "vibe coding" space with tools like Lovable. Some users are praising it; Brendan Jowett praises Stitch as “the most powerful UI designer in the world.” Alexandre Perracho notes you can take Stitch templates into Figma, a great workflow, but that Stitch “tends to ignore some instructions” so is best for initial drafts, not polished interfaces.

A screenshot of a phone

AI-generated content may be incorrect. — Figure 4. Google Stitch generates website designs from a single prompt. The above took 20 seconds.

Google is expanding Gemini’s AI capabilities within Workspace by automatically generating email summaries for longer Gmail threads, appearing at the top of inboxes on mobile devices. These AI-powered summaries update in real-time with new replies.

Web browser maker Opera announced Neon, an agentic browser that can perform autonomous web tasks. Opera Neon is an alpha release available via a waitlist.

World Labs has offered the latest demo of their Odyssey world model that attempts real-world simulation, output to video:

We call this interactive video—video you can both watch and interact with, imagined entirely by AI in real-time. It's something that looks like a video you watch every day, but which you can interact and engage with in compelling ways (with your keyboard, phone, controller, and eventually audio). Consider it as an early version of the Holodeck.

You can try it here.

Razer has launched Wyvrn, a new AI developer platform for gaming on AWS Marketplace. Wyvrn’s AI-powered Razer Game Assistant and Razer QA Companion are currently in beta with studios and aim to help developers scale smarter and faster by automating tasks and improving player experience.

Token Monster, a new AI chatbot platform, has launched its alpha preview. It automatically routes user prompts to the best combination of multiple LLMs and tools, orchestrating multi-model workflows for enhanced outputs.

Snowflake has launched two new open-source AI initiatives: Arctic-Text2SQL-R1 and Arctic Inference. Arctic-Text2SQL-R1 improves text-to-SQL query accuracy by training models on execution correctness for real enterprise databases. Arctic Inference optimizes AI inference performance, offering better responsiveness and cost efficiency through dynamic parallelization for easier deployment.

AI Research News

Alibaba’s Qwen team has introduced QwenLong-L1, a new framework that enables AI reasoning models to reason over extremely long inputs (over 100K tokens), addressing an unsolved challenge in AI reasoning.

As described in the paper “QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning,” the QwenLong-L1 framework adapts short-context AI reasoning models to long-context scenarios via progressive context scaling, using SFT and RL in post-training to train the model. Their QwenLong-L1-32B outperforms OpenAI-o3-mini on long context reasoning tasks.

A new research paper "Learning to Reason without External Rewards" presents a reasoning training framework called "Intuitor" with a counter-intuitive result. They show that AI models can improve through RL without external ground-truth, using only the AI model’s own confidence in an answer. This approach, applied to the Qwen2.5 3B model, achieved results comparable to those obtained using DeepSeek's GRPO (Group Policy Optimization) framework. It suggests we can train AI models in reasoning via self-learning.

A graph of numbers and a number of numbers

AI-generated content may be incorrect. — Figure 5. Intuitor does as well as GRPO, the approach used to train DeepSeek R1.

AI Business and Policy

Meta is reportedly implementing an AI system to evaluate potential harm and privacy risks for updates to apps like Instagram and WhatsApp. This automates a human-led review process required by a 2012 FTC agreement, allowing faster updates. While Meta claims AI will handle low-risk decisions, a former executive warns of risks from this approach.

Meta AI now has one billion monthly active users across its apps, doubling its September 2024 count. CEO Mark Zuckerberg announced this at Meta’s annual shareholder meeting, also saying:

The [Meta AI] focus for this year is deepening the experience and making Meta AI the leading personal AI with an emphasis on personalization, voice conversations and entertainment.

An internal strategy document lays out OpenAI’s ambition to build “your interface to the internet.” This document, revealed through legal discovery in the Google antitrust case, details OpenAI’s plans to evolve ChatGPT into an AI super assistant by H1 2025, performing diverse tasks for users across all life contexts.

Nvidia had another blockbuster quarterly earnings report, reporting $44.1 billion in revenue this past quarter, including $39.1 billion in AI chip and system sales for data centers. However, Nvidia CEO Jensen Huang criticized U.S. chip export bans to China, which has impacted Nvidia sales, resulting in a $4.5 billion charge for Nvidia. He argued the ban strengthens China's AI competitiveness and weakens American global AI platform leadership. He’s not wrong; Chinese AI labs continue to compete, even with AI chip bans.

Spott raised $3.2 million in seed funding for its AI-native platform to revolutionize recruiting. The funding will accelerate development of their agentic system that automates recruiting workflows.

Delaware’s attorney general is hiring an investment bank to advise on OpenAI’s for-profit conversion. This independent evaluation could prolong OpenAI's plans to attract investment and go public.

AI Opinions and Articles

AI Fails highlights: Google’s AI Overviews recently misreported the current year as 2024, despite it being 2025; Google says it is now fixed.

Investigations found Robert F. Kennedy Jr.'s “Make America Healthy Again” report full of AI-generated errors, including dozens of erroneous citations, fictitious sources, and "oaicite" markers from ChatGPT. RFK Jr.'s team blamed “formatting issues” for their AI slop, updating the report to remove markers and correct citations.

It’s not your imagination, AI's adoption pace truly is unprecedented compared to past tech revolutions, according to VC Mary Meeker's new “Trends - Artificial Intelligence” report. Her 340-page slideshow report on AI highlights record-breaking user adoption and rapidly dropping usage costs. Yet AI-based profitability for current companies remains unproven despite massive investment, with the notable exception of Nvidia.

AI Changes Everything

Discussion about this post