AI Week in Review 25.07.05

Baidu ERNIE 4.5, Hunyuan-A13B-Instruct, Pangu Pro MoE, Agentica DeepSWE, rLLM, DynamicsLab Mirage, Cypher Alpha, Qwen-TTS, Kyutai TTS, Daytona Sandbox, MAI-DXO, Meta launches Superintelligence Lab.

Jul 05, 2025

A person in a long dress

AI-generated content may be incorrect. — Figure 1. Still from an AI-generated UGC game engine demo from DynamicsLab Mirage. AI video games have arrived.

Top Tools

The top releases for this week are three open-source fine-grained MoE models with hybrid thinking from Chinese AI labs.

Baidu has open-sourced its ERNIE 4.5 model family, a family of 10 models ranging from 0.3B to 424B parameters, with fine-grained Mixture-of-Experts (MoE) models in thinking and non-thinking variants. The flagship ERNIE 4.5 424B parameter MoE model has 47B active parameters and supports strong multimodal understanding; it achieves 93% on DocVQA and 78.9% on MathVista, comparable to OpenAI’s o1. This makes it the most advanced open-source multimodal AI reasoning model.

Baidu also published an ERNIE 4.5 technical report, providing further detailed information on ERNIE 4.5 architecture, training, and performance. Innovations in training, architecture and inference were shared. The ERNIE 4.5 models and associated toolkits are publicly accessible under the Apache 2.0 license for research and deployment via Baidu’s AI Studio and HuggingFace.

Figure 2. Overview of the Ernie 4.5 model family, which includes a small (0.3B) dense model, midsized (21B and 28B) MoE models, and larger (300B and 424B) MoE models in text-only and multimodal variants, with post-training done on the multi-modal MoE models for hybrid thinking.

Tencent released Hunyuan-A13B-Instruct, an 80B parameter MoE model with 13B active parameters, featuring a 256 K token context window and hybrid fast-and-slow reasoning modes. It matches or exceeds DeepSeek R1 and OpenAI o1 on reasoning and coding benchmarks, scoring 87% on AIME 2024, 64% on LiveCodeBench, and 71% on GPQA-Diamond. This model is very cost-effective and SOTA for its size.

Tencent AI team shared details on Hunyuan-A13B training and inference in the Hunyuan-A13B Technical Report. The model is available under an open license that restricts commercial use in certain regions and user counts and can be accessed on Hugging Face.

Finally, Huawei has released Pangu Pro MoE to open source. Huawei’s Pangu Pro MoE is a 72B parameter MoE model with 16B active parameters per token, trained entirely on its Ascend NPUs to circumvent US hardware sanctions. The Pangu Pro MoE model delivers performance on par with dense models such as Qwen 3 32B and outperforms larger Llama 4 Scout model. The open-source AI model is available on HuggingFace.

Technical details and benchmarks are documented in the paper “Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity.” One architectural innovation is Mixture of Grouped Experts (MoGE), which groups the experts during selection and balances the expert workload, for more balanced load during inference. The model is optimized to run on Huawei Ascend NPUs and processes up to 1,528 tokens per second per card using speculative decoding.

AI Tech and Product Releases

Agentica introduced DeepSWE, an open-source RL-trained Coding Agent. DeepSWE-Preview is an open-source coding agent trained via reinforcement learning on Qwen3-32B that achieves a 59% score on the SWE-Bench-Verified benchmark (42.2% Pass@1, 71% Pass@16). The full code, data, and training logs are available on Agentica’s Notion blog.

A screen shot of a graph

AI-generated content may be incorrect. — Figure 3. DeepSWE-Preview performance on SWE-Bench benchmark compared with comparably-sized AI models and AI agents.

The Agentica team also released rLLM, an open-source framework for RL post-training for AI agents. rLLM is designed to support the creation of custom AI agents similar to their DeepSWE AI agent. As Agentica puts it, “our mission is to democratize RL post-training for general-purpose language agents.”

Dynamics Lab introduced Mirage, the world’s first real-time generative UGC (user-generated content) game engine that creates photorealistic, open-world experiences on the fly via natural language, keyboard, or controller inputs. Running at 16 FPS, it supports extended gameplay across genres without pre-built assets, as shown in the Urban Chaos and Coastal Drift demos. Playable demos are available on the Dynamics Lab blog.

OpenRouter has released Cypher Alpha, a free model offering a one-million token context window and 70 tokens/sec throughput for long-context tasks like code generation. Developers can access and test Cypher Alpha via the OpenRouter platform.

Kyutai TTS from Moshi is now open source, and they are releasing the code for unmute.sh, their modular voice AI system. Kyutai TTS delivers low-latency streaming speech synthesis and speaker similarity of 77.1% (English) and 78.7% (French), achieving a low 2.8% word error rate in English. The Kyutai TTS model card is on HuggingFace.

Alibaba Research announced Qwen-TTS, a multilingual text-to-speech model supporting Chinese dialects like Pekingese and Shanghainese alongside English, delivering human-level naturalness. Accessible via API, it targets applications requiring nuanced multilingual voice synthesis.

Daytona has launched “stateful serverless” sandboxes, which are secure isolated runtimes for AI agents to execute code and workflows. Daytona claims to be the ‘fastest growing infra company in history’ with $1m in ARR in just 60 days. The open-source Daytona platform and SDK are available on GitHub.

Google has reinstated free-tier API access to Gemini Pro 2.5 via AI Studio, offering 100 requests per day and 5 requests per minute at no cost.

Replicate shared a Flux Kontext and Luma Modify workflow for AI-driven video restyling, enabling users to apply advanced style transformations and edits to existing video content.

A person wearing a blue jacket and glasses

AI-generated content may be incorrect. — Figure 4. Still from a modified video using FluxKontext and Luma Labs AI that is available on Replicate. Take your input video and change the style of the first or last frame with the new Flux Kontext app, then use Luma Lab Modify Video.

Google has launched its Veo 3 video generation model to Gemini Advanced subscribers in over 159 countries, allowing users to create three 8-second videos daily from text prompts. DeepMind CEO Demis Hassabis indicated Veo 3 could be leveraged for video game development, as Google continues building towards fully-fledged “world models” with technologies like Genie 2 and advancements to Gemini 2.5 Pro.

Perplexity launched “Perplexity Max,” a $200/month subscription for power users offering unlimited access to Labs and priority access to cutting-edge AI models. The plan is available on the web and iOS and includes early access to new features like the Comet browser.

Cursor has expanded its platform for AI coding agents to web and mobile interfaces, including Slack integration for launching, monitoring, and collaborating on code tasks. More information is available on Cursor’s Agents page.

Apple's iOS 26 introduces a new AI feature allowing users to create calendar events directly from screenshots. This capability, already present on Android via Gemini Assistant, helps users quickly add events to their calendars.

AI Research News

Microsoft’s AI Diagnostic Orchestrator (MAI-DxO) achieves 85.5% diagnostic accuracy on 304 case records from the New England Journal of Medicine, outperforming experienced physicians who averaged 20% accuracy. MAI-DxO orchestrates multiple AI models to simulate a clinician panel, asking follow-up questions, ordering tests, and controlling diagnostic costs. The methodology and related Sequential Diagnosis Benchmark are published in the paper “Sequential Diagnosis with Language Models.”

Using a similar technique of sampling multiple LLM results, Sakana AI’s Multi-LLM technique outperforms single LLMs by scaling inference-time compute across multiple LLMs, through adaptive tree search on the LLM results. Their research was shared in “Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search.”

We reported advances in AI reasoning, including studies of causal reasoning and strategic reasoning, in our AI Research Review 25.07.03. It included the release of GLM-4.1V-9B-Thinking, another high-performance multimodal AI reasoning model from a Chinese AI lab.

AI Business and Policy

After a wide-reported hiring spree, Meta has launched its Meta Superintelligence Labs (MSL), led by Alex Wang and Nat Friedman, recruiting over ten key researchers from OpenAI, Google DeepMind, and other AI organizations with compensation packages rumored at up to $300 million over four years.

OpenAI co-founder Ilya Sutskever has taken over as CEO of Safe Superintelligence (SSI) after co-founder Daniel Gross's departure to Meta. Daniel Levy is now SSI's president, and Sutskever affirmed the company's commitment to its singular mission of developing safe superintelligence.

Amazon has reached 1 million robots in its warehouses, with 75% of global deliveries now robot-assisted. The company also announced DeepFleet, a new generative AI model to increase robotic fleet speed by 10%.

News publishers are seeing increased referrals from ChatGPT but are experiencing a significant drop in organic traffic, and nearly 69% of searches now result in "no-click" results due to AI Overviews. Independent publishers have filed a complaint with the EU Commission alleging Google is abusing its market power by requiring access to content for AI Overviews inclusion, potentially impacting Search visibility and monetization.

Grammarly announced the acquisition of email client Superhuman to build out its AI for its productivity suite. The deal aims to integrate Superhuman's AI-powered email features to create AI agents for emails, a top use case.

Capital One developed an agentic platform for its auto business, designed to problem-solve like human agents and inspired by its own internal risk management. They created an "evaluator agent" to monitor others, leading to a 55% improvement in dealership sales leads.

The U.S. Senate voted overwhelmingly on Tuesday to remove a controversial 10-year ban on states’ abilities to regulate AI. The “AI moratorium” aimed to prevent stifling innovation from fragmented state regulations, but bipartisan concerns that it would block state-level consumer protections led to a Senate vote to strip the provision.

The European Union confirms it will stick to its AI Act timeline despite over 100 tech companies, including Alphabet and Meta, urging a delay. Companies argued it hurts Europe’s AI competitiveness. The landmark Act, fully in force by mid-2026, bans "unacceptable risk" AI and regulates "high-risk" applications like biometrics.

AI Opinions and Articles

Sam Altman envisions AI leading to a “gentle singularity” of abundance, but this article from Gary Grossman suggests a “murkier middle ground” where AI brings both gains and dislocation. This future may fragment society's “cognitive commons,” with AI-generated content and personalized information spaces making shared reality and democratic discourse difficult.

His AI future scenario of challenges to community seems more realistic than stark dystopia or vibrant utopia, and his prescription, to “live wisely in this new terrain” sounds like the kind of good advice our society will attempt but fail to take.

As AI reconfigures the terrain of cognition, the fabric of our social world is quietly being tugged loose and rewoven, for better or worse. The question is not just how fast we move as societies, but how thoughtfully we migrate. – Gary Grossman

AI Changes Everything

Discussion about this post