AI Week in Review 25.09.20

Grok 4 Fast, GPT-5-Codex, Meta Ray-Ban Display, Tongyi DeepResearch-30B-A3B, Magistral Medium and Small 1.2, Perceptron's Isaac 0.1, The New Reve, Gemini in Chrome, Notion 3.0, AI wins math gold.

Sep 21, 2025

A person standing next to a large white cylinder

AI-generated content may be incorrect. — Figure 1. Mark Zuckerberg’s demo of the Meta Ray-Ban Display, that failed onstage. Meta now says “We DDoS’d Ourselves.”

Top Tools

xAI has released Grok 4 Fast - breaking through our intelligence vs cost frontier by achieving Gemini 2.5 Pro level intelligence at a ~25X cheaper cost. - Artificial Analysis

xAI released Grok 4 Fast, a fast, efficient, and low-cost yet high-performing AI model, “pushing the frontier of cost-efficient intelligence.” Grok 4 Fast scores near SOTA on various benchmarks, overall scoring 60 on Artificial Analysis Intelligence Index, on par with Gemini 2.5 Pro but behind GPT-5. Additionally, it is trained end-to-end with tool-use in mind, performing well on agentic browsing benchmarks and suitable for agentic use cases.

A graph of different colors

AI-generated content may be incorrect. — Figure 2. Grok 4 Fast performs comparable to Gemini 2.5 Pro and Claude 4.1 Opus and behind GPT-5, yet it has a starkly lower price point than all other leading AI models.

Grok 4 Fast combines its near-SOTA reasoning with unmatched speed, efficiency, and low cost. Grok 4 Fast has a competitive API price of only $0.2 per 1M Input Tokens and $0.5 per 1M output tokens. Grok 4 Fast is also quite token efficient, using only 61M tokens to complete the AA intelligence index, significantly less than Gemini 2.5 Pro’s 93M and Grok 4’s 120M.

Getting a Gemini 2.5 Pro level AI model at an extremely low cost is a game-changer. This makes Grok 4 Fast an important, interesting and appealing AI model, especially for agentic AI use-cases.

A close-up of a graph

AI-generated content may be incorrect. — Figure 3. Grok 4 Fast pushes the price-performance curve up and to the left, scoring comparable to Claude 4.1 Opus yet for a price point 50 times less!

AI Tech and Product Releases

OpenAI released GPT-5-Codex, a fine-tune of GPT-5 optimized for agentic coding that is integrated in OpenAI’s Codex coding agent. GPT-5-Codex steps up performance over the original Codex on long-horizon software tasks with integrated planning and tool use. The launch post includes examples of multi-hour problem solving and details on how the system handles failures and retries.

User reviews, for example from Bijan Bowen, confirm GPT-5-Codex has substantially improved in agentic coding, running faster than before and producing more intelligent coding output; but it remains slow and has quirks. With these updates, though, Codex is becoming a worthy competitor to Claude Code.

This years Meta Connect was all about Meta’s new smart glasses. Meta announced Ray-Ban Meta Display, smart glasses with an in-lens screen that shows messages, translations, and Meta AI outputs in the heads-up display; a companion Meta Neural Band helps navigate the display. They also debuted Gen-2 Ray-Ban Meta glasses and Oakley Meta Vanguard, performance AI glasses for athletic use. Mark Zuckerberg’s live demo fails got the most news attention.

Alibaba’s Tongyi Lab open-sourced Tongyi-DeepResearch-30B-A3B, an agentic web-research AI model that achieves parity with OpenAI’s Deep Research on several information-seeking benchmarks (such as 70% on GAIA, and 32.9% on HLE). Tongyi Labs’ announcement post details the model’s technical development, including the synthetic data and training pipelines with on-policy agent RL training, and the use of ReAct and “Heavy Mode” research process. The open weights model is available on Hugging Face.

Mistral released Magistral Medium 1.2 (magistral-medium-2509) and Magistral Small 1.2 (magistral-small-2509). Magistral-Small-1.2, their open (Apache-2.0) 24B reasoning model, has expanded vision support, long context, and reasoning-trace handling via special tokens. Positioned for math, science, and coding tasks, the model shows improved scores on AIME, GPQA, and LiveCodeBench benchmarks. Magistral Small is available on Hugging Face.

Perceptron AI launched Isaac 0.1, an open weights 2B parameter “perceptive-language” model understands and interacts with the physical world with spatial intelligence. The announcement claims Isaac 0.1 excels in visual QA tasks and is competitive with much larger models on visual perception. Isaac 0.1 is on Hugging Face.

A stack of books on a table

AI-generated content may be incorrect. — Figure 4. Example of Isaac 1.0 solving a Visual QA problem.

Reve introduced ‘The New Reve’ a new unified image-creation and editing experience that combines four products in one: A text-to-image model and remixer; a drag-and-drop object-level editor; a creative assistant; and an API for third-party plug-in features. The New Reve includes editable uploads, natural-language edits, and direct manipulation.

Reve’s site blog outlines capabilities and roadmap. With Reve as media generator, iterative editor, and agent cockpit, this release breaks new ground in AI-native media creation interfaces.

Google announced a major AI revamp for Chrome, detailing major new AI capabilities inside Chrome including Gemini in Chrome. Gemini in Chrome is an in-Chrome assistant that activates via clicking on the Gemini icon; it can help users summarize and digest complex information across tabs, search references, draft responses, and integrate with web services.

We’re building an AI-centric browser that uses context — like the page you’re reading or the tabs you have open — to help you get things done faster, easier, and more safely than ever. - Google

Gemini in Chrome is rolling out to US users using English for now, with availability for more people and languages soon. Google has joined the AI browser wars.

Zoom launched cross-app AI updates including a unified notetaker and AI avatars. In its latest release, Zoom’s upgraded AI companion added cross-meeting-app note capture, AI-assisted scheduling, user-likeness avatars, and other workflow automations. These features consolidate meeting tooling, competing with other vertical AI productivity apps.

Notion has introduced Notion 3.0 with Agents, their “biggest evolution” of Notion, which puts AI Agents at the forefront of the Notion toolset. The Notion 3.0 document and productivity application now gives AI Agents the ability to process any Notion information and workflows a human could:

Think of your Agent as a Notion power user that can handle entire workflows and access all the information you can.

Notion 3.0 will evolve to support Notion agent customization.

OpenAI added a “thinking time” control to ChatGPT, letting users trade off speed versus reasoning depth for GPT-5. These controls are easy to use and address user feedback that default thinking could be excessive for simple tasks. The options are rolling out to Plus and Pro users on web now, with modes for mobile coming soon.

Amazon introduces an always-on AI agent for Marketplace sellers. The update to Seller Assistant adds agentic execution so the system can take action on a seller’s behalf to manage listings, support, and other back-office tasks instead of just answering questions.

AI Research News

Independent researchers Jeremy Berman and Eric Pang obtained new state-of-the-art results on the ARC-AGI benchmarks, Berman’s write-up describes swapping code generation for structured natural-language reasoning, using Grok-4 and multi-agent orchestration to achieve 79.6% on ARC v1 and a new SOTA 29.4% on ARC v2 (previous best: 25%).

OpenAI and Google DeepMind AI models achieved historic winning performances at the 2025 ICPC World Final coding competition. OpenAI’s AI system got a perfect score, solving all 12 problems and beating all human teams competing, while DeepMind’s Gemini 2.5 Deep Think solved 10, including one no human solved, earning a gold medal. We discuss the milestone accomplishment further in “AI Wins Global Coding Contest,” but the most important takeaway is that general AI reasoning models accomplished this.

The latest example of AI accelerating science is how DeepMind discovered new solutions to century-old equations in fluid dynamics. They used a family of AI-enabled approaches (PINNs, or Physics-Informed Neural Networks) that are trained to match the laws of physics; this can solve for challenging PDEs governing fluid motion, find novel solution families and benchmark against traditional solvers.

This breakthrough represents a new way of doing mathematical research, combining deep mathematical insights with cutting-edge AI. - Deep Mind

AI Business and Policy

Waymo signed a deal with Toyota to integrate its next-generation AI driver system into Toyota's electric vehicles. The partnership, valued at an estimated $4 billion, will see the Waymo Driver AI platform deployed in Toyota vehicles starting in late 2028.

Nvidia took a $5 billion stake in Intel and has offered a partnership with Intel on AI and computing chip technology; this sent Intel stock soaring. Nvidia also plans to invest $2 billion in a new AI research and development center in Singapore, focused on developing specialized hardware for large language models and generative AI.

At Huawei Connect, Huawei described their “SuperPoD Interconnect” network fabric linking up to 15,000 accelerators, including Ascend AI chips, to build large-scale AI training and inference clusters. With Nvidia AI chip supply to China constrained by US-China geopolitical tensions, China is pushing for homegrown AI infrastructure.

OpenAI announced new restrictions and product changes in ChatGPT for users under 18, in a post by CEO Sam Altman titled “Teen safety, freedom, and privacy.” They are predicting age of users and applying different rules for minor users; this includes restrictions on AI interactions and more parental controls. While they have been pushed by public concerns, they are presenting it as pro-actively supporting better safety for teens:

We prioritize safety ahead of privacy and freedom for teens; this is a new and powerful technology, and we believe minors need significant protection. – OpenAI CEO Sam Altman

Gartner projects a shift to “preemptive” AI cybersecurity by 2030, forecasting that preemptive capabilities, often AI-driven, will make up half of IT security spending, displacing legacy detect-and-respond tooling. The note signals budget realignment toward proactive models and agentic defenses over the next five years.

The Verge reports the latest OpenAI hardware rumors: OpenAI might be developing a smart speaker, glasses, voice recorder, and a pin. Based on supply-chain feelers and potential collaborations, the company is exploring various purpose-built ChatGPT devices; a product launch is targeted for late 2026.

The National Bureau of Economic Research and OpenAI released a joint report on how people use ChatGPT, based on surveys analyzing ChatGPT usage at scale. The NBER paper and OpenAI’s summary describe consumer-heavy use patterns and task categories. For example:

We find steady growth in work-related messages but even faster growth in non-work-related messages, which have grown from 53% to more than 70% of all usage. … “Practical Guidance,” “Seeking Information,” and “Writing” are the three most common topics and collectively account for nearly 80% of all conversations.

This AI usage study sheds new light on AI work adoption and consumer behavior; smart AI influencers are mining the data to find AI startup ideas.

AI Opinions and Articles

The New York Times recently profiled Eliezer Yudkowsky as “AI’s Prophet of Doom.”

Yudkowsky has a new book out, “If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All.” In my view, his AI fearmongering is far beyond reasonable reality, but it’s easy to sell fear in a time of massive change and uncertainty.

Cloudflare’s Matthew Prince argues AI firms should face accountability for scraping and downstream harms in a Wired interview. The Cloudflare interview covers internet content provenance, infrastructure responsibilities, and policy levers he believes are necessary as AI models scale and change how the internet works.

He suggests three paths for the internet:

1. Dead internet: AI drowns out human content and AI content floods the internet.

2. Black Mirror internet: Journalists, researchers, and creators exist but work for AI giants.

3. Licensing model: AI companies pay creators for their content.

This contributes to an ongoing debate about data rights and platform obligations in the AI era, but there is a self-interested motivation behind Prince’s position: Cloudflare wants to license internet usage by AI agents, with licensing protocols mediated by Cloudflare.

AI Changes Everything

Discussion about this post