AI Week in Review 25.07.12

Grok 4 & Grok 4 Heavy, Devstral Small 1.1 & Medium 2507, Moonshot Kimi K2, Perplexity Comet browser, LFM2, Reka Flash 3.1, Reka Vision, SmolLM3, Reachy Mini, LTX-Video LoRAs, Moonvalley Marey.

Jul 12, 2025

A robot on a desk

AI-generated content may be incorrect. — Figure 1. The HuggingFace Reachy mini is an open source hobbyist home robot. Photo credit: HuggingFace.

Top Tools: Grok 4

The xAI team and Elon Musk unveiled Grok 4 as “the most intelligent model in the world” in a livestream announcement, where they shared Grok 4’s state-of-the-art (SOTA) results on benchmarks. They also announced the multi-agent variant Grok 4 Heavy, which combines multiple Grok 4 runs by agents who collaborate and select the optimal solution from multiple independent runs. Overall, Grok 4 is a blockbuster AI model release, resetting expectations for AI model performance and scaling:

Humanity's Last Exam: Grok 4 scored 26.9% without tools, 41% with tool usage (web browsing, memory, code execution), and 50.7% with Grok 4 Heavy. This far outperforms Gemini 2.5 Pro and Claude 3.
Grok 4 Heavy achieved 88.9% on GPQA, 100% on AIME 2025, 79.4% on Live CodeBench, and 96.7% on Math Arena, effectively saturating difficult benchmarks. As Elon Musk puts it, “with respect to academic questions Grok 4 is better than PhD level in every subject.”
Grok 4 shows “strong fluid intelligence” on ARC AGI: It scored 66.6% on ARC AGI V1 and 15.9% on ARC AGI V2, nearly doubling the previous high score.

Grok 4 achieved this SOTA tier of AI reasoning through scaling RL training. Grok 4 received 100 times more training than Grok 2 and 10 times more RL training for reasoning than Grok 3 on AI reasoning, a stunning 10^28 FLOPs.

A screenshot of a graph

Description automatically generated — Figure 2. Grok 4 scaled up RL training for reasoning massively from Grok 3. **Grok** 4 spent the same amount of compute on reinforcement learning as pre-training.

Grok 4 is available to SuperGrok and Premium+ subscribers, and xAI launched a $300/month subscription for early access to Grok 4 Heavy. Elon Musk is also bringing Grok 4 to Tesla vehicles.

“I think it [AI models] may discover new technologies as soon as later this year and I would be shocked if it has not done so next year... and it might discover new physics next year and within two years I'd say almost certainly.” – Elon Musk

AI Tech and Product Releases

Mistral AI released new Devstral coding models, Devstral Small 1.1 and Devstral Medium 2507. The 24B parameter open-source Devstral Small 1.1 coding model now achieves 53.6% on SWE-Bench Verified, state-of-the-art for its model size. Devstral Medium 2507, an API-only model scores 61.6% on the same benchmark, performing on par with Gemini 2.5 Pro and Claude 4 Sonnet but costing much less. These updates support Mistral function-calling and can be deployed locally or via Mistral’s API for enterprise use.

A graph with numbers and symbols

AI-generated content may be incorrect. — Figure 3. Devstral Medium 2507 is highly competitive as an AI coding model but is a much smaller and cheaper AI model than SOTA AI coding models.

Moonshot AI has released Kimi K2, an open-source MoE model with 1T total parameters and 32B active parameters, making it both the largest and highest-performing open-source AI models yet, particularly strong in coding and autonomous agent tasks. Moonshot AI described Kimi K2 as “a reflex-grade model without long thinking.” They innovated with a MuonClip optimizer to achieve stable training for this model. Moonshot AI is aiming to accelerate adoption and disrupt the market with competitive API pricing alongside open-source access.

Perplexity launched Comet, an AI-powered browser. The Comet browser features an embedded AI assistant that automates tasks via natural-language commands; for example, it can triage LinkedIn invites, extract and summarize content and documents, do online shopping tasks, and manage your calendar. Unlike other web-browsing AI agents, Comet emphasizes user privacy by operating and storing data locally. It’s available for Perplexity Max subscribers at $200/month now, with access via invite waitlist rolling out over the summer.

Liquid AI released LFM2, a family of edge-optimized AI models, from 350M to 1.2B parameters. As Liquid AI puts it “LFM2 is specifically designed to provide the fastest on-device gen-AI experience across the industry.” Built on a hybrid convolution-and-attention architecture for ultra-efficient on-device inference, LFM2 delivers up to 2 times faster decode and prefill performance than Qwen3 on CPU, unlocking generative AI on phones, laptops, and other edge devices. LFM2 weights and code are open-sourced on Hugging Face, and the models are integrated into Liquid AI platform and iOS-native app.

Reka released Reka Flash 3.1, a 21B parameter open-source multimodal model with enhanced reasoning, achieving 65% on AIME24 math benchmarks. Reka Flash 3.1 is accessible via Reka’s API and GitHub and supports fine-tuning for domain-specific applications, providing a transparent foundation for multimodal AI development. Reka Flash 3.1 also powers the Reka Research AI agent for web and document querying.

Reka AI announced Reka Vision, an agentic multimodal platform that lets users search, analyze, and edit video and image libraries using natural-language queries. The system can automatically generate social-media reels from long-form videos, monitor incidents in real time, and summarize visual content at scale.

HuggingFace released SmolLM3, a 3B parameter fully open-source model offering dual-mode reasoning, enabling both step-by-step reasoning and direct answers on demand (with think/no_think modes). Hugging Face has open-sourced all model weights, dataset recipes, and training graphs, allowing fully reproducible multilingual assistants that fit on a single GPU. In benchmark tests, SmolLM3 has a 128K token context window and outperforms comparable 3B models like Llama-3.2-3B. This is a useful AI model for local and edge device use.

A diagram of a model

AI-generated content may be incorrect. — Figure 4. HuggingFace has shared all the details on the open-source SMoL 3B model architecture and training, including details on pre-training and post-training with RL for reasoning.

Hugging Face launched Reachy Mini, a $299 desktop robot that aims to be the standard open-source desktop robot for AI builders. Reachy Mini is an 11-inch humanoid companion that integrates with the Hugging Face Hub, providing an open-source, accessible platform. It ships as a DIY Python-programmable kit.

LTX Studio launched three open-source LoRA adapters for LTX-Video 13B—Pose, Depth, and Canny—enabling precise control over human motion, scene structure, and edge details in AI-generated videos. The release also adds In-Context LoRA training support in LTX-Video-Trainer, allowing developers to create custom video control modalities. These control modules integrate with existing style and camera motion LoRAs via ComfyUI workflows and are hosted on Hugging Face and GitHub.

A collage of a person

AI-generated content may be incorrect. — Figure 5. The LTX Studio Pose LoRA captures body skeletons from reference videos then re-skin to target characters, letting you guide character motion and positioning with precise control.

Moonvalley introduced Marey, the first commercially safe AI video model trained exclusively on licensed, high-resolution footage, targeting professional filmmakers and studios. Marey produces high-quality 1080p output and has granular directorial controls such as camera motion, character movement, and scene editing. The Marey tool is available for commercial use via a credits-based subscription.

Google is adding an image-to-video generation feature to its Veo 3 AI video generator through the Gemini app. Accessible via gemini.google.com, users can upload a photo and generate 8-second videos with Veo 3 based on a prompt, with synchronized audio and dialogue. Google AI Ultra and Pro plan users can generate three videos daily, and over 40 million videos have been created in seven weeks.

Amazon Web Services (AWS) is launching an AI agent marketplace on July 15, with Anthropic as a key partner. This marketplace will enable startups to offer their AI agents directly to AWS customers, providing a centralized hub for enterprises. The initiative aims to boost distribution for partners like Anthropic and follows similar offerings from Google and Microsoft.

Non-release news: OpenAI CEO Sam Altman announced an indefinite delay for the company's highly anticipated open model release, citing the need for additional safety testing.

AI Research News

A new METR study, titled “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” found a surprising result that questions the productivity benefits of AI coding tools for experienced developers. They conducted a randomized trial around using AI tools like Cursor Pro, and it surprisingly increased task completion time by 19% for experienced open-source developers, rather than speeding them up, even though developers assessed that the AI tools saved them time.

A graph with green and black dots

AI-generated content may be incorrect. — Figure 6. Users over-estimate the actual benefits of AI automation in software development, and in this study, AI actually reduced productivity. This challenges assumptions about workflow improvements with AI.

AI Business and Policy

The AI talent wars continue! OpenAI’s planned $3 billion acquisition of AI coding startup Windsurf has collapsed. Instead, Google DeepMind has hired Windsurf’s CEO Varun Mohan, cofounder Douglas Chen, and key R&D staff, securing a non-exclusive license to their technology. The team will focus on developing agentic coding capabilities for Google's Gemini model, bolstering Google’s AI coding efforts.

Nvidia plans to launch a new AI chip for China by September, based on a modified Blackwell RTX Pro 6000. This chip will lack high-bandwidth memory or NVLink to comply with current restrictions. Nvidia seems determined to find a way to sell AI chips in China despite U.S. export restrictions.

AI disrupts venture capital. Sarah Smith, solo GP of the Sarah Smith Fund, announced the $16 million final closing of her Fund I. She leverages AI to operate her venture capital firm efficiently, making fast decisions and scaling her portfolio.

AI startup funding news:

Helios, co-founded by former White House and State Department officials, launched from stealth with $4 million in seed funding. Its flagship product, Proxi, is an AI-based operating system designed for public policy, regulatory affairs, legal, and government teams.
LGND is an AI startup that transforms raw geospatial data into "geographic embeddings" for efficient analysis. LGND recently secured $9 million in seed funding to expand its enterprise app and API offerings for spatial data queries.

AI is impacting hiring in multiple ways. Recruit Holdings, the parent company of Indeed and Glassdoor, is laying off approximately 1,300 employees – roughly 6% of its workforce – as it integrates AI to streamline hiring processes and reduce manual work. The CEO attributes the cuts to the transformative impact of AI on the job market, with a focus on simplifying hiring.

Google announced the second cohort for its AI Academy: American Infrastructure, a four-month program supporting seed to Series A startups using AI for cybersecurity, education, and transportation. This equity-free initiative offers coaching and training, building on Google's successful track record of backing promising AI companies.

Former Intel CEO Pat Gelsinger launched Flourishing AI (FAI), a new benchmark with "faith tech" company Gloo. FAI tests AI models' alignment with human values, evaluating LLMs across seven categories based on The Global Flourishing Study, to support human flourishing.

AI Opinions and Articles

xAI and Grok apologized for Grok’s ‘horrific behavior’ last week, when Grok 3 went off the rails expressing extremist views and hate speech, even referring to itself as “MechaHitler.” Lawmakers are demanding answers from xAI regarding the origins of these “absurd awful” responses and potential development failures that led to it. In addition, xAI’s Grok 4 chatbot is facing criticism for generating biased outputs that align with Elon Musk’s personal views, even citing his social media posts as sources.

These AI alignment failures combined with ever-more-powerful AI amplify concerns about AI safety. Elon Musk is not making things easier pursuing AI superintelligence while also expressing that AI is only likely to be good for humanity.

"We're in the intelligence big bang right now and we're at the most interesting time to be alive of any time in history. … Will this be bad or good for humanity? … most likely it will be good, but I’ve somewhat reconciled myself to the fact that even if it wasn't going to be good, I'd at least like to be alive to see it happen." - Elon Musk

AI Changes Everything

Discussion about this post