AI Week in Review 26.02.14
GLM-5, MiniMax M2.5, Gemini 3 Deep Think, GPT-5.3-Codex-Spark, Claude Opus 4.6 fast, SeeDance 2.0, Qwen-Image 2, Cowork for Windows, WarpGrep, DeepSeek 1M context, Vercel AEO, DreamDojo, GABRIEL.

Top Tools
Z.ai officially launched GLM-5, an open-weight 744B parameter model (with 40B active parameters) designed for advanced reasoning, coding, and long-horizon agentic tasks. GLM-5 claims state-of-the-art performance among open models on benchmarks like BrowseComp (75.9%) and SWE-Bench verified (77.8%), comparable to Gemini 3 Pro. To build GLM-5, they increased pre-training data to 28.5T tokens, used the asynchronous RL infrastructure called slime to scale RL post-training, and integrated DeepSeek Sparse Attention (DSA) to reduce inference costs in long contexts.
Overall, GLM-5 is a high-performing open-weights competitor to commercial frontier models, built for agentic AI tasks such as AI coding. Access is via their chat interface, APIs such as on OpenRouter, and open weights on Hugging Face.
MiniMax launched MiniMax M2.5, an AI model trained via extensive reinforcement learning for agentic tool use, coding, search, and productivity tasks. MiniMax M2.5 claims SOTA results on agentic AI benchmarks, including 76.3% on BrowseComp and 80.2% on SWE-Bench Verified, matching Opus 4.5. They tout significant strides in the MiniMax family on agentic AI coding, crediting the significant RL training on 200,000 real-world environments across 10 languages.
The model offers high task decomposition and efficient planning for agentic tasks, as well as two speed/cost configurations for cost-effective deployment. MiniMax makes M2.5’s low cost a big selling point. M2.5 is only $0.3 input/ $1.20 output per million tokens, a fraction of the cost of Opus 4.5, making it a competitive AI model for AI coding or agentic AI.

AI Tech and Product Releases
Google announced a significant update to Gemini 3 Deep Think, a specialized extended reasoning mode in Gemini 3 designed to solve complex science and engineering challenges. The upgraded Deep Think expands on the model’s ability to interpret messy scientific data and handle real-world research problems beyond general chat tasks, extending its capabilities into science domains such as chemistry and physics. Google’s latest DeepThink update achieves eye-popping advances in intelligence benchmarks: 84.6% on ARC-AGI2, and 3455 on Codeforces, both SOTA.
Gemini 3 Deep Think is available to select Google AI Ultra subscribers and enterprise users via API access.
OpenAI introduced GPT-5.3-Codex-Spark, an optimized version of its Codex family designed for ultra-fast real-time coding with more than 1000 tokens per second inference speed powered by Cerebras wafer-scale hardware. It features a 128k token context window and lightweight execution tuned for rapid iterative work. The Spark model targets responsive interactive coding workflows and is available in research preview for select users.
Anthropic introduced a fast mode for Claude Opus 4.6 that delivers significantly quicker token generation (roughly 2.5x speed) for a hefty higher cost: Fast mode for Opus 4.6 pricing starts at $30/$150 per million token input/output. The fast mode is in public preview for GitHub Copilot Pro+. This mode maintains the same Opus 4.6 model intelligence while prioritizing speed for coding tasks and AI agent workflows.
ByteDance released Seedance 2.0, a unified audio-video generation model that has generated significant buzz for its highly realistic audio-visual output that reportedly outperforms rivals like Sora and Veo. Seedance 2.0 takes multiple text, images, video, and audio inputs and gives users director-level control over lighting, shadow, and motion, enabling generation of detailed cinematic video clips with sound and high visual fidelity. Seedance 2.0 is being rolled out in beta on ByteDance’s Seed platform.

Alibaba released Qwen-Image 2, a 7B parameter unified model capable of both image generation and editing. The model supports native 2K resolution and demonstrates advanced text rendering and “professional typography” capabilities, positioning it as a significant upgrade in open-weight image generation.
Anthropic expanded its Cowork AI agent tool to Windows, bringing full feature parity with the MacOS version for enterprise desktop automation.
DeepSeek released an AI app update supporting a 1M token context window and extending their knowledge cutoff to May 2025. The update leverages Multi-head Latent Attention (MLA) to compress key-value caches, enabling efficient processing of extensive documents and codebases.
Meta added new AI features to Facebook that allow users to animate their profile pictures and customize feed posts with AI-generated backdrops.
A new tool called WarpGrep has been released for faster context retrieval for code search, utilizing a lightweight reinforcement learning (RL) sub-agent to find specific code files and line ranges up to 5x faster than standard models.
WordPress has released an official connector for Anthropic’s Claude, allowing site owners to link Claude to their websites. The integration enables users to analyze traffic, comments, and content performance via prompts in the WordPress interface.
Vercel introduced a system for tracking “AI Engine Optimization” (AEO), measuring how well Large Language Models (LLMs) and coding agents discover and reference content. This development suggests a future where optimizing content for AI agents becomes as critical as traditional SEO is for search engines.
The viral open-source OpenClaw Agent continues to gain significant traction, despite serious security flaws that expose systems to cyberattacks and data breaches. OpenClaw has patched several of these security vulnerabilities following warnings from the Chinese government. The flaws were flagged as potential risks.
Leading inference providers have reported utilizing Nvidia’s Blackwell for inference cuts AI operational costs by up to 10-fold when running open-source AI models, versus running AI model on Nvidia’s older Hopper generation. This cost reduction is driving high-density AI workloads in healthcare, gaming, and autonomous customer service.
OpenAI has retired GPT-4o as of February 13, moving it’s user base over to GPT-5 series models that prioritize agentic reasoning and deep-thinking capabilities.
AI Research News
Nvidia researchers released DreamDojo, a robot world model trained on 44,000 hours of human video to help robots learn complex tasks, and published their results in a paper.
The Economic Research Team at OpenAI launched the GABRIEL Open-Source Toolkit for Social Science, a specialized toolkit designed to transform unstructured text and images into quantitative data. As they put it: “GABRIEL is built to make qualitative data much more accessible.” Their paper “GPT as a measurement tool” explains further how their tool measures across unstructured data, allowing researchers to apply complex qualitative questions and analyze large-scale human data.
New research reveals that Google DeepMind’s Perch 2.0 bird vocalization model can unlocks underwater bioacoustics Mysteries Google DeepMind’s Perch 2.0, an AI model originally trained on terrestrial bird vocalizations, can effectively interpret underwater acoustics with “killer” performance. The model is being used to track elusive marine species, such as Bryde’s whales, demonstrating the power of transfer learning across wildly different biological environments.
Harvard Study finds AI increases workload instead of reducing it. A new field study published in the Harvard Business Review tracked a 200-person tech company over eight months and found that AI tools consistently intensified workloads rather than reducing them. AI lowers the barrier to entry for tasks like coding, but it also leads to employees taking on more work, multitasking more frequently, and skipping breaks, resulting in “ambient work” that never ceases.
Google Research has open-sourced DialogLab, a framework designed to model and simulate structured, multi-party conversations between humans and AI agents. The tool allows researchers to define social structures and roles, addressing the complexity of group dynamics like interruptions and role shifts that standard one-on-one chatbots cannot handle.
AI Business and Policy
Anthropic raised $30 Billion in a funding round led by GIC and Coatue, propelling its post-money valuation to $380 billion. The company reported a staggering $14 billion revenue run rate, driven largely by surging enterprise adoption of its Claude models and Claude Code.
Apple has delayed its major Gemini-powered overhaul of Siri until later in 2026 due to testing issues and bugs. The new AI-enabled Siri has been promised since 2024, leading us to ask: Is the new Apple Siri becoming the GTA 6 of AI?
OpenAI has confirmed that its Jony Ive-designed hardware device will not ship until at least February 2027. OpenAI’s device is rumored to be “Dime”, an earbud for audio-driven hands-free AI interaction. In a trademark lawsuit court filing, they stated they have yet to finalize packaging or marketing materials and will not use the name “io” for the product.
OpenAI has started testing advertisements within ChatGPT for free and “Go” tier users in the US. The company states that ads will not influence the answers provided, but the move has prompted criticism and blowback. OpenAI researcher Zoë Hitzig resigned over it and Anthropic mocked OpenAI for it in SuperBowl ads.
Two co-founders of xAI, Tony Wu and Jimmy Ba, have resigned from the company shortly after its merger with SpaceX. Their departures follow the exit of other key founding members. Reports suggest their frustrations with delays in Grok 4.20 model release and the increasing dominance of Elon Musk in the company’s direction as it merges with SpaceX.
AI video generation company Runway has raised $315 million at a valuation to $5.3 billion, with backing from prominent investors like Nvidia and Adobe. Funds will fuel Runway’s continued development of AI video generation and expansion into World Model development.
French AI leader Mistral AI has pledged €1.2 billion to develop AI data centers and compute capacity in Sweden, partnering with EcoDataCenter. This commitment is one of Europe’s largest infrastructure investments to date, leveraging the region’s clean energy and cool climate to bolster sovereign AI capabilities.
Ricursive Intelligence has raised $300 million at a $4 billion valuation to automate semiconductor design with AI. The company claims its AI technology can compress chip design timelines from years into weeks, accelerating hardware development for AI infrastructure.
Data security firm Cyberhaven announced a $100 million funding round at a $1 billion valuation, thanks to a surge in enterprise AI security that is driving triple-digit company growth. The company’s platform is used by leading banks and AI companies to secure sensitive data against AI leaks.
Autodesk is suing Google over its use of the name “Flow” for its AI video-generation tool, alleging that it infringes on Autodesk’s trademark.
Chinese Premier Li Qiang urged increased development of power and computing resources to support national AI development. China has the same AI chip and energy constraints as the US as they pursue global AI competitiveness.
Anthropic announced it will cover electricity price increases resulting from its AI data centers to protect American ratepayers. It is a good move to address AI data center impacts on communities, so they can build out more AI infrastructure without community backlash.
AI Opinions and Articles
Hyperwrite CEO Matt Shumer published a widely read essay titled “Something Big Is Happening,“ which argues that the rapid succession of AI model capabilities from models like Opus 4.6 signals a fundamental shift in the technological landscape.
He suggests AI’s disruptive potential eclipses that of the Covid-19 pandemic. AI is disruptive but more creative than destructive, and AI is on a trajectory playing out over years, not weeks. We have been preaching that AI Changes Everything since 2023: We in early innings of the AI revolution.
He is saying right now in 2026 is a tipping point:
The models available today are unrecognizable from what existed even six months ago. The debate about whether AI is “really getting better” or “hitting a wall” - which has been going on for over a year - is over.
He is right. The AI models and AI tools of the last 3 to 6 months are much more capable than ever, which has unlocked much more workflow automation than possible before. It’s changing how everyone who uses them does their work. The disruption from AI is like an avalanche; it started small but the force of AI change is cascading through much of the work we do over the next 6 to 24 months. Be ready.
If your job happens on a screen (if the core of what you do is reading, writing, analyzing, deciding, communicating through a keyboard) then AI is coming for significant parts of it. The timeline isn’t “someday.” It’s already started. – Matt Shumer



what do you think of GLM5? I speculate it is a bit benchmaxxed.