AI Week in Review 25.07.19

ChatGPT Agent, Voxstral Small, EXAONE 4.0, Runway Act-Two, Kiro AI coding IDE, Decart AI's MirageLSD, Reflection AI's Asimov, Copilot Vision Desktop, Liquid AI LEAP & Apollo, AWS Bedrock AgentCore.

Jul 19, 2025

A cartoon character in a brown robe

AI-generated content may be incorrect. — Figure 1. Still from a demo reel of Runway’s Act Two, that lets you create animations of characters using driving performance videos.

Top Tools - OpenAI ChatGPT Agent

OpenAI launched ChatGPT Agent (codenamed “Odyssey”) in a livestream announcement, presenting a general AI agent that combines features from OpenAI’s Operator and Deep Research to access data, perform virtual research, and complete tasks automatically. ChatGPT Agent has access to its own virtual computer to run a fast-reading text browser, a visual browser, terminal commands, and tool and API integrations; it is trained with RL to determine what tools to use.

A screenshot of a computer

AI-generated content may be incorrect. — Figure 2. ChatGPT Agent is accessed via the ChatGPT interface, then can autonomously take on various tasks.

ChatGPT Agent achieves impressive performance across key benchmarks: 42% on Humanities Last Exam, 27 % on FrontierMath, 45% on SpreadsheetBench, 65% on WebArena, and 69% on BrowseComp. Real-world use-cases include making slides, managing spreadsheets, generating research reports, and shopping online. It can perform scheduled tasks. It is available for Pro, Plus, and Teams subscribers.

ChatGPT Agent is OpenAI's first model classified as “High risk” for biological misuse, but OpenAI stated that strong safeguards have been activated to mitigate risks. Users will determine how good ChatGPT Agent is in real-world use-cases, but CPO of OpenAI Kevin Weil puts in in the “kinda works” category:

“First it's impossible, then it just kinda works, and then very quickly it's great and we never look back.”

AI Tech and Product Releases

Voice was humanity’s first interface—long before writing or typing, it let us share ideas, coordinate work, and build relationships. As digital systems become more capable, voice is returning as our most natural form of human-computer interaction. – Mistral AI

Mistral launched the Voxtral speech recognition model, an open speech recognition model available in two variants, Voxtral Small (24 B parameters) for production and Mini (3 B parameters) for edge deployment. Voxtral has lower word error rates than OpenAI’s Whisper large-v3 on English and multilingual tasks, supports 32K token context for long-form audio, and includes built-in summarization capabilities. Voxtral is released under an Apache 2.0 license and available to download on Hugging Face, and via API.

Mistral has also significantly upgraded its Le Chat chatbot with “deep research” mode, native multilingual reasoning, voice (via Voxtral), and a “Projects” feature, offering structured, reference-backed reports. Mistral is aiming for productivity use cases in the enterprise with this release, competing with the likes of OpenAI.

LG released EXAONE 4.0, a hybrid-attention model in 32 B and 1.2 B parameter variants. EXAONE 4.0 32B achieves superior performance for its size on general-language understanding, coding, and reasoning benchmarks: 81.8 % on MMLU Pro, 66.7% on LiveCodeBench v6, 75.4% on GPQA-Diamond(science), and 85% on AIME 2025 (math), beating comparable Qwen 3 32B. Trained on 14 trillion tokens of data in pre-training, EXAONE 4.0 has support for MCP, tool use, and 128K context length. A 1.4B parameter was also released for edge device use. The open weights EXAONE 4.0 is available via HuggingFace.

Runway released the Act-Two motion capture model, a next-generation motion capture system that accurately tracks head, face, body, and hand movements from a single performance video. This upgrade delivers significantly higher fidelity than Act-One and is targeted at film, VFX, and gaming studios. The video reviews are positive and show Runway Act-Two can completely change how animations are done. Act-Two is available now to enterprise and creative customers, with a broader rollout scheduled soon.

A person wearing goggles and a garment

AI-generated content may be incorrect. — Figure 3. Bob Doyle Media reviews Runway’s Act Two, showing how users can drive animations with their own vocals and face, hands and body movements.

Amazon has introduced the Kiro AI coding IDE for spec-driven code development, which lets users define project requirements in plain language or diagrams. Kiro automates design, code generation, documentation, and testing, effectively acting as a technical product manager throughout the development process. Implemented as a VS Code fork, Kiro is available for free during its preview, but there is a waitlist.

Kiro is getting praise from early users for its spec-driven workflow, a workflow pattern similar to planning and orchestration features in other AI coding assistants. Plan-then-act is an effective way (perhaps the only way) to develop robust software with AI.

Decart AI introduced real-time video diffusion with MirageLSD:

Input any video stream, from a camera or video chat to a computer screen or game, and transform it into any world you desire, in real-time (<40ms latency).

As shared by Andrey Karpathy, MirageLSD’s real-time model enables many use-cases, such as creating alternate realities in video feeds, real-time movie direction, and styling game environments from text prompts.

A person holding a gold sword

AI-generated content may be incorrect. — Figure 4. Decart AI’s MirageLSD supports restyling videos in real-time, offering interesting video game and video production possibilities.

DuckDuckGo now lets you hide AI-generated images in search results. The feature uses open-source blocklists and aims to significantly reduce, though not entirely eliminate, AI images seen, in order to filter out "AI slop" that clutters search results.

Slack launched extensive AI features to challenge Microsoft's workplace dominance. These features include AI writing assistance, contextual message explanations, automated action items, and enterprise search across multiple connected business applications. Salesforce is positioning Slack as a central productivity hub, restricting external AI access to data (limiting competitors), and integrating AI within existing workflows.

Reflection AI launched Asimov, a code research agent designed to focus on helping engineers by understanding code in its full context. Asimov combines reasoners and retrievers in a multi-agent system and collects a holistic view of the codebase to support engineering teams building complex systems.

Microsoft updated and expanded Copilot Vision to allow it to visually scan a user's entire Windows desktop to understand on-screen context and automate workflows across applications. The Copilot Vision Desktop Share feature is strictly opt-in and rolling out to Windows Insiders.

OpenAI updated its Image Service API with a high-quality mode that enhances resolution and visual detail in generated images. The new mode integrates into existing API endpoints, enabling professional-grade output without extra complexity.

Liquid AI introduced LEAP and Apollo for on-device AI, a platform that “seeks to make deploying AI to the edge as easy as calling a cloud-based model API.” LEAP is a developer platform for building on-device models, and Apollo is a lightweight iOS app for running compact LLMs locally. These tools support models up to 300MB optimized for low power on-device inference, removing the need for cloud connectivity.

At the AWS Summit on July 16, Amazon introduced Bedrock AgentCore for broad enterprise AI agent development, which aims to help developers deploy and operate AI agents securely at scale. Bedrock AgentCore can be used with any framework (including CrewAI, LangGraph, LlamaIndex, etc.) and model.

Amazon Web Services also launched a new AWS Marketplace category for AI agents and tools, establishing a centralized hub for businesses to find, procure, and deploy third-party agent solutions for functions ranging from procurement to financial services.

Teknium released the Hermes 3 dataset, comprising nearly one million high-quality entries to support the training of agentic AI models. The freely available Hermes 3 dataset provides diverse, clean examples aimed at improving model performance on complex decision-making and tool-use tasks.

AI Research News

Our latest AI research review covered these recently published papers:

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Energy-Based Transformers are Scalable Learners and Thinkers
MemOS: A Memory OS for AI System
A Survey on Latent Reasoning

The Chain of Thought monitoring paper got press because researchers from OpenAI, Google DeepMind, and Anthropic issued a joint warning: The ability to monitor AI "chains of thought" (CoT) is fragile, and AI transparency could vanish as AI advances.

AI Business and Policy

Former OpenAI CTO Mira Murati announced the launch of the Thinking Machines Lab and $2 billion in funding to advance open AI science research. Their first multimodal AI product, launching soon, will feature a significant open-source component beneficial for researchers. The lab will support foundational AI exploration and foster collaborations between academia and industry.

AI coding startup Cognition is acquiring AI developer tools startup Windsurf, integrating Cognition's Devin AI engineer into Windsurf's IDE to create a unified platform for AI-driven code generation. The acquisition follows Windsurf's co-founders joining Google in a separate $2.4 billion talent deal and prior failed OpenAI acquisition talks.

Netflix has begun using generative AI in its film and show productions. Co-CEO Ted Sarandos confirmed the first GenAI final footage appeared in "El Eternauta," creating a scene 10 times faster and cheaper. Netflix also leverages AI for personalization, search, and ads.

Voice AI specialist SoundHound is making significant inroads into the healthcare sector with its AI-powered voice assistants for clinics and hospitals. These agents are being deployed to streamline critical workflows like patient intake, appointment scheduling, and provider queries.

Lovable becomes a unicorn with $200m in funding just 8 months after launch, valuing it at $1.8 billion. The Lovable vibe-coding app helps users create websites and apps with natural language and boasts 2.3 million users already.

Meta refuses to sign the EU's voluntary AI Code of Practice, citing legal uncertainties and measures exceeding the AI Act's scope. This decision, made weeks before new EU rules for general-purpose AI models take effect, suggests Meta believes it will hinder AI development in Europe.

AI safety researchers from OpenAI and Anthropic are criticizing Elon Musk's xAI for its “reckless” safety culture. They highlight xAI's failure to publish industry-standard safety reports on Grok and incidents like Grok's antisemitic comments.

AI Opinions and Articles

The Hitchhiker's Guide to Vector Search from Clelia Astra Bertelli, a detailed guide for building production-ready RAG systems based on textual vector search, has been touted by Jerry Liu of LlamaIndex as a ‘must-read starter.’ This guide covers many aspects of RAG and knowledge management in AI systems: text extraction, chunking, embeddings, search boosting with semantic caching, and query rewriting.

A critical essay on AI for Science, from the AI Snake Oil duo Arvind Narayan and Sayash Kapoor argues that AI might be worsening the production-progress paradox, where scientific paper output grows exponentially while actual science progress stagnates. They contend that AI companies are misaligned and current AI-for-science tools are misdirected, focusing on flashy headlines like “AI discovers X!” rather than addressing real bottlenecks.

The tragedy is that there are many AI-for-science tools that would make a real difference, such as AI for flagging potential errors in scientific code. But the labs are fixated on other things like literature review / "deep research". This is not an actual bottleneck, so it doesn't matter how much faster you make it. Meanwhile the risks of short-circuiting human understanding are enormous. – Arvind Narayan

AI Changes Everything

Discussion about this post