AI Week in Review 2025.03.01

Claude 3.7 Sonnet, GPT-4.5, Qwen QwQ-Max, Wan 2.1 video gen open sourced, free Gemini Code Assist, Phi-4-multimodal & Phi-4-mini, ElevenLabs Scribe, Alexa+, Grok-3 voice mode, Meta Aria Gen 2.

Mar 01, 2025

A person in a red cloak with her hands up

AI-generated content may be incorrect. — Figure 1. Still from Wan2.1 AI video generation demo. Wan2.1 is a suite of AI video generation foundation models that pushes the boundaries of video generation.

Top Tools

Anthropic launched Claude 3.7 Sonnet and the Claude Code coding agent, challenging OpenAI’s latest models, DeepSeek R1 and Grok-3. Claude 3.7 Sonnet is a next-generation hybrid reasoning AI model featuring a toggleable “extended thinking” mode that lets users trade off speed for deeper reasoning. As noted in our Claude 3.7 Sonnet Unleashed breakdown of this release, it is particularly well-suited to coding (70% on SWE-Bench) and high-level reasoning (78.2% on GPQA in ‘extended thinking’ mode).

Anthropic’s Claude Code is a command-line coding assistant that helps users extend the utility of Claude code generation. It’s available in limited preview for now.

Figure 2. Claude 3.7 Sonnet’s hybrid AI model approach allows users to switch between fast answers and deep reasoning in their queries. This is the way.

OpenAI released GPT-4.5, calling it “our largest and best model for chat yet.” Despite improvements over previous versions, the performance of GPT-4.5 on various benchmarks reveals both strengths and limitations of GPT-4.5. It’s a better AI model but not a huge leap in performance over GPT-4o.

We and others have called GPT-4.5 the “Vibe” release, with OpenAI touting its strengths in knowledge, writing and conversational style. According to internal tests, GPT-4.5 excels in persuasion, particularly adept at convincing another AI model, GPT-4o, to donate virtual money. Yet GPT-4.5 as a non-reasoning AI model also falls behind competitor’s AI models on coding, math, and reasoning benchmarks.

GPT-4.5 will first be available to ChatGPT Pro subscribers starting Thursday, followed by Plus customers next week.

AI Tech and Product Releases

The Alibaba Qwen team announced the QwQ-Max AI reasoning model Preview release, in a self-referential blog article written by QwQ-Max-Preview itself:

We’re happy to unveil QwQ-Max-Preview, the latest advancement in the Qwen series, designed to push the boundaries of deep reasoning and versatile problem-solving. Built on the robust foundation of Qwen2.5-Max, this preview model excels in mathematics, coding, and general-domain tasks, while delivering outstanding performance in Agent-related workflows.

While the Qwen team promises an open-source release for QwQ-Max and smaller variants such as QwQ-32B, for now it is available only via Qwen chatbot interface. Benchmark results weren’t mentioned, but this is intended to compete against DeepSeek R1, o3-mini, and other AI reasoning models.

Alibaba released its Wan 2.1 generative AI image/video model to open source, allowing developers free access via platforms like ModelScope and Hugging Face. Alibaba released four different versions of the model, in 1.3B and 14B sizes with T2V-14B model establishing a new SOTA performance standard.

Google CEO Sundar Pichai announced free Gemini Code Assist:

Starting today, you can use Gemini Code Assist for free in your favorite IDE. It’s Gemini 2.0 fine-tuned for coding, including 180K code completions/ month - *90x* what others currently offer. Plus, Gemini 2.0 Flash-Lite is now GA and cost-effective for projects that use long context windows. Find it in the Gemini API in Google AI Studio.

Microsoft has officially released the Phi-4 series models with several new Phi-4 family model releases. Expanding the family of launched Phi-4 models, Microsoft has now introduced Phi-4-mini-instruct (3.8B) and Phi-4-multimodal (5.6B), to join previously released Phi-4-14B, and extending capabilities in function-calling and reasoning:

Phi-4-mini brings significant enhancements in multilingual support, reasoning, and mathematics, and now, the long-awaited function calling feature is finally supported. As for Phi-4-multimodal, it is a fully multimodal model capable of vision, audio, text, multilingual understanding, strong reasoning, encoding, and more.

The Phi-4 models are all open source, available on HuggingFace and for download to local devices, e.g., with Ollama.

A screenshot of a computer

AI-generated content may be incorrect. — Figure 3. The Phi-4 model family has expanded, with Phi-4-mini-instruct (3.8B) and Phi-4-multimodal (5.6B), joining previously released Phi-4-14B.

AI voice startup ElevenLabs launched its first speech-to-text model, Scribe, which supports transcription in over 99 languages. Claiming to be “the world’s most accurate speech to text model,” Scribe offers features like audio-event tagging and word-level timestamps, with a real-time transcription version in the works.

Amazon announced generative AI-powered Alexa+ as it seeks to breathe life into its aging assistant. The upgraded Alexa+ virtual assistant can engage in natural conversations and perform complex tasks (like booking reservations or controlling smart-home devices) via voice, using Amazon’s Nova and Claude AI models depending on the given task. Alexa+ will be free for Prime members and $19.99 for others.

OpenAI expanded Deep Research access to Plus and other paid users. Deep Research answers complex topic questions autonomously by scouring web sources then building reports.

Figure 4. OpenAI’s Deep Research scores 25% on Perplexity Labs’ extremely hard “Humanity’s Last Exam” benchmark, far ahead of other AI models on that test.

OpenAI launched its Operator AI agent internationally, expanding its reach. However, it remains for now only available to ChatGPT Pro ($200/mo) users. Also, OpenAI made Sora video generation model Sora available to users in Europe, including EU and UK.

Google is giving Sheets a Gemini-powered upgrade, using Gemini to help users generate insights from data and create advanced visualizations. The feature is now available to all Workspace business users, accessible via a Gemini icon in the spreadsheet.

Meta has unveiled the next generation of its Project Aria augmented reality glasses for research: Aria Gen 2. The new device features an upgraded sensor suite, including a PPG sensor for heart rate monitoring and a contact microphone to isolate the wearer’s voice, along with capabilities like eye tracking and speech recognition.

Microsoft finally released a macOS app for Copilot, which assists users with online tasks like drafting emails and summarizing documents. Copilot also includes an image generator.

Grok-3 has released a voice mode, available via the Grok app. It comes with a variety of voices and modes, and it’s been reported that Grok’s new “unhinged” voice mode can curse and scream, simulate phone sex.

Coming soon:

OpenAI plans to integrate Sora AI video generation into ChatGPT. OpenAI's product lead for Sora, Rohan Sahai, indicated the company intends to integrate its video generation tool, Sora, directly into ChatGPT, though no timeline was provided.
Chinese AI firm DeepSeek is rushing to accelerate its new R2 model release, capitalizing on its success with DeepSeek R1.
Perplexity teased a new AI-powered web browser called Comet, stirring excitement among tech enthusiasts.
Meta reportedly plans to release a stand-alone app for its AI assistant, Meta AI, to better compete with AI-powered chatbots like OpenAI’s ChatGPT and Google’s Gemini. A release is planned for the second quarter.
Figure is planning to start “alpha testing” its Figure 02 robot in home settings later in 2025, accelerated by the Helix Vision-Language-Action (VLA) model.

AI Research

Our AI research highlights article for this week these recent research results:

Deep Research System Card
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

AI Business and Policy

Apple committed $500 billion to investing in the U.S., including investing in multiple AI initiatives, such as a new AI server factory in Texas.

Despite Big Tech’s AI push, a new Pew survey suggests 81% of U.S. workers aren’t using AI on the job. Many workers express skepticism or concern about AI’s impact, indicating a potential gap between corporate investments in AI and actual workplace adoption.

Meta is looking for 5-7 GW for AI data centers in a $200 billion AI data center project, a sign that its capacity plans are even more ambitious than what Zuckerberg has stated publicly.

The Big Tech AI investment boom is going global. Alibaba has pledged to spend over $53 billion for AI infrastructure, including data centers, over the next three years.

Thinking Machines Lab, the new AI startup founded by former OpenAI CTO Mira Murati, is reportedly raising $1 billion at a $9 billion valuation.

Education company Chegg filed a lawsuit against Google for using generative AI “Overviews” to scrape Chegg’s textbook solutions and siphon student traffic. Chegg claims Google’s AI summaries hurt its homework-help business, while Google counters that overviews improve user engagement and clicks to publishers’ sites.

Qatar signed a 5-year deal with Scale AI to develop over 50 government AI applications.

Swiss startup Unique raised $30 million for agentic AI in financial services, supporting the development of customizable AI agents for tasks like investment research and due diligence. Unique aims to expand internationally, particularly in the U.S.

Stripe's annual letter noted an "AI boom," showing that the top 100 AI companies reached $5 million in annualized revenue in 24 months in 2024. The company emphasized the rapid growth and vertical-specific impact of AI startups, noting examples like Cursor, Lovable, and Bolt, while cautioning against dismissing these innovations as merely "LLM wrappers."

AI Opinions and Articles

In 2023, OpenAI dominated with GPT-4, and they stayed ahead in 2024 with GPT-4o. That’s no longer the case. The releases of Gemini 2.0 and DeepSeek R1 and now the major frontier AI model releases this month – Grok-3, Claude 3.7 Sonnet, and GPT-4.5 – have changed competitive landscape for AI models.

I assessed the three newly released top-tier AI models in GPT-4.5 - The “Vibe” Release:

Each top-tier model has its strengths: Grok 3 with thinking stands out in advanced reasoning; Claude 3.7 Sonnet excels at coding; and GPT-4.5 delivers superior conversational skills and broad knowledge.

With several SOTA frontier AI models, each with strengths and weaknesses, this is the best of all worlds for AI consumers: Great choices! The best approach is to try multiple AI models and learn which ones serve you best.

AI Changes Everything

Discussion about this post