AI Week in Review 25.06.07

Gemini 2.5 Pro 06-05, Mistral Code, Cursor 1.0, Claude Gov, Common Pile v0.1, ElevenLabs V3, Mirage Studio. ChatGPT connectors & Record Mode. Gemini Scheduled Actions, visual AI Mode & Search Live.

Jun 08, 2025

A person in a spaceship

AI-generated content may be incorrect. — Figure 1. “World building” AI art from Chris First.

Top Tools

Google has released an upgraded Gemini 2.5 Pro with improvements across the board. This latest boost makes already-leading Gemini 2.5 Pro the best publicly available AI model (at least for now): 21.6% on Humanity’s Last Exam, 86.4% on GPQA Diamond, 88.0% on AIME 25, and 82.2% on Aider polyglot. This update supports thinking budgets and fixes areas where their May code-focused update regressed from the March update:

We also addressed feedback from our previous 2.5 Pro release, improving its style and structure — it can be more creative with better-formatted responses.

This is likely the final tweak to Gemini 2.5 Pro, which has incrementally evolved since December: “this model will be the generally available, stable version starting in a couple of weeks, ready for enterprise-scale applications.”

A screenshot of a computer screen

AI-generated content may be incorrect. — Figure 2. Gemini 2.5 Pro benchmark results.

AI Tech and Product Releases

Mistral AI has announced Mistral Code, joining the AI coding assistant competition with an enterprise coding assistant that offers unprecedented customization, allowing local deployment, fine-tuning on private codebases, and specialized AI models. Mistral is built on open-source coding assistant Continue and works in JetBrains and VSCode IDEs. Their emphasis on enterprise concerns of security, compliance, and data privacy positions Mistral Code as an interesting alternative, challenging rivals like GitHub Copilot, Cursor, and OpenAI Codex.

Cursor has released Cursor 1.0, adding several features to their leading AI coding assistant: Bug Bot auto-reviews PRs, Background Agent, Cursor support for Jupyter (iPython) notebooks, extracting memories from conversations, and one click MCP install for easier MCP setup.

Anthropic’s Claude Code is now included in the Claude Pro plan. Pro plan subscribers can use their rate limits for Claude apps and Claude Code. On a related note, OpenAI’s Codex coding assistant is now available to plus members and has internet access.

Anthropic launched Claude Gov, a specialized suite of Claude models tailored for U.S. defense and intelligence agencies, featuring relaxed guardrails for classified data handling, enhanced document understanding, and contextual analysis optimized for national security workloads.

EleutherAI released Common Pile v0.1, an 8 TB collection of openly licensed, public domain text for AI model training. This was developed with a broad group of partner organizations over two years. This dataset was used to train new Comma models, showing training on their open dataset performs competitively with those trained on copyrighted material. EleutherAI aims to demonstrate the viability of licensed data and promote transparency amidst ongoing AI copyright lawsuits.

ElevenLabs revealed their V3 alpha model, ‘the most expressive TTS model’ yet. It can speak in 70+ languages, supports multi-speaker dialogue and emotional, expressive audio tags such as [excited], [sighs], [laughing], and [whispers]. The ElevenLabs V3 demo is very impressive; these voices pass the audio Turing test.

Captions introduced Mirage Studio which helps users “craft talking videos with AI,” providing talking-head avatar videos similar to HeyGen:

Generate expressive videos at scale, with actors that actually look and feel alive. Our actors laugh, flinch, sing, rap — all of course, per your direction. Just upload an audio, describe the scene, or drop in a reference image, and create energetic content in minutes.

You can use it for explainer videos or marketing, or even make your own music video from a Suno AI music track.

Figure 3. With Mirage Studio, AI can go from script to explainer video. Since other AI tools can generate scripts, the whole process can be automated.

OpenAI made several incremental feature updates to ChatGPT:

OpenAI released connectors allowing Deep Research to connect to local data on GitHub, Google docs, Gmail, Google calendar, SharePoint, Outlook, HubSpot, Dropbox, Box, and more. Users can also connect from chat to Google docs, SharePoint, Dropbox, and Box.
MCP support will be available to Pro users.
ChatGPT now has a Record Mode to capture, transcribe, and summarize meetings straight into ChatGPT. It produces structured output and a full transcript with timestamps.
OpenAI is upgrading ChatGPT Advanced Voice, making interactions feel more fluid and human-like with enhancements in AI voice output intonation and naturalness.
ChatGPT now has expanded memory capabilities for free users. The model can reference recent conversation history (beyond saved memories) to produce more contextually aware and personalized responses.

OpenAI’s strategy is to build out ChatGPT product features to make it the “core AI subscription.” A sign their strategy may be working: OpenAI's business user base surged 50% since February, reaching 3 million paying enterprise customers.

Google is releasing new Gemini features announced at the recent Google I/O:

Google is rolling out “scheduled actions” in the Gemini app for paid subscribers, enabling timed recurring or one-off task execution of tasks. For example, users can ask Gemini to provide calendar summaries on a daily basis or generate an event summary after it takes place. This feature offers capabilities akin to ChatGPT's recurring actions.
Google is rolling out interactive chart visualizations in AI Mode in Labs to help bring financial data to life for questions on stocks and mutual funds. Powered by Gemini, this feature allows users to compare and analyze real-time and historical financial information via AI-generated interactive graphs and explanations.
Google began testing “Search Live” in AI Mode on select Android and iOS devices, introducing real-time voice and video conversational search capabilities that allow the AI to ask clarifying questions and process camera inputs for contextual responses.

Microsoft Copilot Actions, an experimental feature, is now in Copilot Labs for Copilot Pro users in the U.S., allowing the AI to perform web-based tasks—such as booking hotels, ordering flowers, and finding flights—directly from chat prompts.

Windows Insiders on Copilot+ PCs gained access to “Relight”, a new AI-powered feature in Microsoft Photos that offers dynamic lighting controls to adjust the illumination of pictures post-capture.

AI Research News

Anthropic open-sourced its circuit tracing tools to demystify LLMs' “black box” nature for developers and enterprises. Anthropic’s circuit tracing tool uses mechanistic interpretability to understand internal model workings, investigate errors, and enable granular fine-tuning. It aims to improve transparency, debug models, combat hallucinations, and enhance the reliability of enterprise AI systems.

A new study published in “How much do language models memorize?” reveals that GPT-style LLMs have a fixed memorization capacity of approximately 3.6 bits per parameter. The research indicates that models don't memorize more with increased training data, but instead distribute this fixed capacity over more data, forcing less memorization per sample. Training on more data leads to better generalization.

Our AI Research Review for this week covered AI research on self-improving AI and entropy management in RL training for reasoning:

Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

The Entropy Mechanism of RL for Reasoning Language Models
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning
Skywork Open Reasoner 1 Technical Report

AI Business and Policy

Elon Musk’s X / xAI are seeking $5 billion in debt funding to support expanding their AI infrastructure, but Elon Musk's public feud with Donald Trump is hindering xAI's $5 billion fundraising efforts. Morgan Stanley is shopping it to investors, but the debt is trading below its target price.

Anthropic has appointed national security expert Richard Fontaine to its long-term benefit trust. This appointment aims to strengthen the trust's ability to guide complex decisions where AI intersects with national security, which is becoming more important as AI becomes central to many aspects of national security.

OpenAI is appealing a court order to preserve all ChatGPT chat data, including deleted conversations, issued in the New York Times’ copyright lawsuit. The company is arguing it overreaches and weakens user privacy; they are advocating for “AI privilege” to protect user information. While OpenAI maintains data access is limited to an audited legal and security team, the retention raises questions about user privacy and data ownership.

Anysphere, maker of the AI coding assistant Cursor, has raised $900 million at a $9.9 billion valuation. This marks the 3-year-old startup's third fundraise in less than a year, with its annualized revenue exceeding $500 million.

Dublin-based AI startup Solidroad raised $6.5 million in seed funding to expand its platform that automatically trains customer service representatives and improves AI agents. Their solution analyzes 100% of customer interactions to provide AI-powered quality assurance and personalized training.

Voice AI startup Rime recently announced a $5.5 million seed round. Rime’s Arcana text-to-speech (TTS) model generates diverse, human-like voices from simple text descriptions, addressing a key struggle in conversational AI. Trained on natural conversations, it boosts customer sales by 15% and increases bot interaction fourfold for clients like Domino's and Wingstop.

Meta’s Oversight Board overturned the company’s decision to allow a deepfake video of Ronaldo Nazário promoting a gambling app on Facebook, finding it violated Meta’s fraud policies.

A proposed Senate budget package would tie broadband infrastructure funding to a 10-year moratorium on state AI laws, creating a less regulated environment.

Anthropic CEO Dario Amodei has argued against this approach, warning that a moratorium without federal policy could stifle beneficial state action and leave no national oversight. He advocates instead for a federal transparency standard to avoid halting progress and ensure a consistent regulatory framework.

AI Opinions and Articles

In recent weeks, Anthropic cut off Windsurf from official Claude 3 and Claude 4 APIs, without warning. Windsurf isn’t completely without access to Claude, but they are forced to find 3^rd-party computing providers with API access. Given how Claude 4 is the best-favored AI coding model for many, a user backlash ensued.

Anthropic defended their decision to cut off Windsurf in the wake of OpenAI’s purchase of Windsurf. It’s all about not giving room to competitors:

“We really are just trying to enable our customers who are going to sustainably be working with us in the future. I think it would be odd for us to be selling Claude to OpenAI,” - Anthropic Chief Science Officer Jared Kaplan at TC Sessions: AI 2025.

Anthropic has another reason is that Anthropic is computing-constrained today. Anthropic would like to reserve its computing for what Kaplan characterized as “lasting partnerships.”

AI Changes Everything

Discussion about this post