AI Week in Review 26.02.07
Claude Opus 4.6, GPT-5.3-Codex, Kling 3.0, Grok Imagine 1.0, Qwen3-Coder-Next, Voxtral Transcribe-2, Roblox' Cube Foundation Model, MiniCPM-o 4.5, OpenAI Frontier, Codex for MacOS, ACE-Step 1.5.

Top Tools: Opus 4.6 and GPT-5.3-Codex
On the same day this week, Anthropic and OpenAI launched competing frontier AI models – Opus 4.6 and GPT-5.3-Codex – that are both the new state-of-the-art in agentic AI.
OpenAI released GPT-5.3-Codex, an agentic AI model designed to generate, debug, and reason over complex codebases across long-horizon tasks. The model shows SOTA performance on real-world software benchmarks (56.8% on SWE-Bench Pro), improves speed (25% speedup) and token efficiency over prior versions, and introduces mid-task steerability for real-time control.
Targeted at developers, GPT-5.3-Codex is available in their Codex AI coding tool and new Codex MacOS app. However, OpenAI mentions GPT-5.3-Codex use-cases beyond AI code, such as financial advice and fashion presentation generations, touting a big jump in OSWorld score (64.7%) for agentic computer-use and 70.9% on GDPval for overall knowledge work.
Anthropic launched Claude Opus 4.6, a major update to its flagship model with improved reasoning, coding support improvements, and a one million token context window for long-context capabilities. Claude Opus 4.6 has improved context compaction and adaptive thinking to be more effective for AI coding. The release introduces support for enterprise workflows such as document generation, spreadsheets, presentations, and financial analysis:
We’ve made substantial upgrades to Claude in Excel, and we’re releasing Claude in PowerPoint in a research preview. This makes Claude much more capable for everyday work.
Claude Opus 4.6 is now available via Claude, Claude Code, and APIs, and other AI coding tools. Pro-tip: Google’s Anti-gravity AI coding tool has free-tier access to Claude Opus.

In conjunction with Opus 4.6, Anthropic added an Agent Teams feature to Claude Code, where you can ask Claude Code to split a task among multiple agents that communicate peer-to-peer and work in parallel. AI Jason explains details on how this differs from sub-agents; you can interact with individual agent teammates directly without going through the lead.
Head-to-head, which new model is best? OpenAI and Anthropic avoided head-to-head benchmarks, but vibe tests tell more anyway. Claude Opus 4.6 is better says AICodeKing; Greg Isenberg agrees that Claude Opus 4.6 is a more powerful autonomous agent for complex projects, while noting strengths for GPT-5.3 Codex in speed and as an interactive collaborator.
Recommendation: Try both. One approach is to “generate your plan using Codex and implement it through Opus,” or you can implement code in Opus 4.6 and have Codex 5.3 review it.
AI Tech and Product Releases
Kling AI released Kling 3.0, an all-in-one multimodal video generation model supporting text, image, audio, and multi-shot video creation. The update improves visual fidelity, temporal consistency, longer generations (15 seconds at 1080p), and contextual understanding for realistic video production. It’s the latest ‘best ever’ video generation model; Curious Refuge rated Kling 3.0 as state-of-the-art, for image-to-video, beating prior leader Luma Labs’ Ray 3.14.
xAI announced Grok Imagine 1.0, a video generation and editing model capable of producing 10-second clips with native audio and lip-sync. The model targets rapid multimodal content creation, with multiple editing features, including restyle, scene control, swapping objects, object control, and add performance. It performs strongly on third-party benchmarks, at a price and speed below Veo3 and Sora 2.

Alibaba announced Qwen3-Coder-Next, an 80B parameter mixture-of-experts model optimized for agentic coding workflows with only 3B active parameters. The open weights Qwen3-Coder-Next model supports up to 256K context and excels at long-horizon coding tasks, getting 44.3% on SWE-bench Pro and over 70% on SWE-Bench verified, numbers comparable to Claude Sonnet 4.5. This model offers fast, efficient coding automation that competes with other models with much larger footprints.

Mistral AI released Voxtral Transcribe 2, an open-source (Apache 2.0 license) speech-to-text model offering sub-200 millisecond latency, improved multilingual accuracy, and speaker diarization. It supports real-time and offline transcription for privacy-sensitive and low-latency applications. Designed for integration into agent workflows and productivity tools, Voxtral is available via Mistral’s API or their mini 4B model can be downloaded via HuggingFace for local use.
Roblox introduced the Cube Foundation Model, a generative AI system for accelerating creation of 3D assets and environments. The model powers “4D generation,” which is described as 3D generation with embedded scripts defining its real-time behavior. This allows creators to generate complex 3D content with pre-defined behaviors from simple inputs, lowering barriers to entry for generative world-creation.
OpenBMB announced open-source MiniCPM-o 4.5, an omni-modal model capable of seeing, listening, and speaking simultaneously. MiniCPM-o 4.5 is based on SigLip2, Whisper-medium, CosyVoice2, and Qwen3-8B with a total of 9B parameters. It supports full-duplex multimodal interaction across text, vision, and audio, providing natural AI interactions at a claimed performance level of Gemini 2.5 Flash.
OpenAI introduced Frontier, a platform for building, deploying, and managing enterprise-grade autonomous AI agents for business workflows. The Frontier platform enables secure multi-step task execution within corporate environments and supports scalable agentic workflows. Frontier is positioned as a competitor to Claude’s recently released Cowork and a foundation for production-ready AI coworkers.
Expanding their terminal-based Codex AI code tool into an app, OpenAI released a native Codex application for macOS. The app provides a dedicated desktop GUI interface for managing AI coding workflows, with voice dictation, slash commands, Git integration, file previews, and support for GPT-5.3-Codex. This could become a trend, where terminal-based and IDE-based AI coding tools converge to become AI-first coding assistance apps.
Perplexity introduced Model Council, a system that runs multiple frontier AI models in parallel to generate unified, cross-validated answers. The Model Council approach can take answers from Claude, GPT-5.2, and Gemini models and present their results, with commonality and differences noted. This improves reasoning quality and reduces hallucination and error due to reliance on a single model.
Vercel released a revamped version of v0, their generative UI tool for frontend development, to better align AI-generated code with production infrastructure. The update aims to close the gap between prototyping and production app deployment, and it emphasizes best practices, deployment readiness, and seamless integration with existing workflows. Vercel also upgraded v0 to run on Claude Opus 4.6.
Humanoid.ai introduced KinetIQ, an AI orchestration framework that coordinates fleets of humanoid robots across industrial and home environments. KinetIQ is a single AI model that can control multiple robots with different morphologies and designs; the system supports cross-timescale planning and multi-robot collaboration. It represents progress toward scalable AI-driven robotics ecosystems.
AI Research News
The ACE Studio team announced ACE-Step 1.5, a 4B parameter open-source AI music generator model capable of producing full songs with lyrics and style control in under ten seconds on consumer GPUs. The technical report “ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation” details the model architecture and training. ACE-Step 1.5 combines a language-model (LM) planner and diffusion transformer (DiT) acoustic synthesizer, which enables fast local music generation. While scoring above commercial music generation models on SongEval benchmarks.
Researchers from Hong Kong’s HKUST introduced a framework for synthesizing robot training data from human video in the paper “HumanX: Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos.” The HumanX system learns transferable interaction skills across tasks such as sports and object manipulation from converting videos of human motion into models of robotic motion. This is then used to train robots on generalizable interaction skills, demonstrating generalization in humanoid robot learning.
Perplexity released the DRACO (Deep Research Accuracy, Completeness, and Objectivity) benchmark to evaluate the performance of Deep Research AI tools on real-world complex research tasks. Perplexity AI showed their own Deep Research tool performs at the state-of-the-art on external benchmarks as well as across multiple domains on the DRACO benchmark.

AI Business and Policy
Microsoft launched the Publisher Content Marketplace to enable transparent licensing of content for AI training and AI content serving:
Publisher Content Marketplace (PCM) [is] a solution that gives publishers a new revenue stream, provides AI systems with scaled access to premium content, and delivers better responses for consumers.
The platform allows publishers to monetize datasets while providing AI developers with legally cleared training material. The initiative addresses copyright and sustainability concerns.
Google reported that its Gemini mobile app surpassed 750 million monthly active users. Growth is driven by deep Android integration and frequent feature updates. The milestone highlights intensifying competition among AI assistants.
https://techcrunch.com/2026/02/05/google-gemini-750-million-users/
Anthropic’s new Super Bowl ads are sparking an online spat between Anthropic and OpenAI. Anthropic launched a Super Bowl ad that took a shot at OpenAI’s plan to put ads into ChatGPT, stating that Claude will remain ad-free to preserve user trust and product quality. OpenAI’s CMO and CEO Sam Altman shot back that Anthropic’s ad-free messaging was “dishonest” and that Anthropic was about control.
Meanwhile, Google announced a Gemini AI commercial scheduled to air during the Super Bowl that targets increasing consumer awareness of Gemini’s capabilities. Safe and boring
ElevenLabs secured $500M in funding to expand AI voice generation research and infrastructure, at an $11 billion valuation, triple the prior company valuation in January 2025. The investment supports scaling operations and new synthetic media applications.
Intel announced they will be producing dedicated GPUs targeting AI training and inference. The move aims to challenge Nvidia with competitive pricing and customer-focused solutions. It could increase hardware competition.
SpaceX is acquiring xAI, merging two elements of Elon Musk’s technology empire. Musk is pitching this as a “vertically-integrated innovation engine” for AI and space.
Anthropic is nearing a funding round exceeding $20 billion, valuing the company around $350 billion, second only to OpenAI in total valuation for a pure AI firm. The round reflects strong investor demand for Anthropic as it makes gains in the enterprise AI market with their latest AI models.
Markets reacted sharply to reports of major hyperscalers spending $600 billion on AI infrastructure projects in the coming year, including a $200 billion plan by Amazon. There was a melt-down in software stocks due to concerns AI would disrupt their business, but on Friday there was a sharp turnaround. AI may continue to induce market volatility.
At the Responsible AI in Military Domain summit, the US and China declined to sign a voluntary declaration governing military AI use. Out of 85 countries who attended the summit, only 35 countries signed the pledge; this raises concerns of ineffective defense AI governance at the international level.


The Opus 4.6 versus GPT-5.3-Codex framing misses what's actually happening beneath the model wars. I've been deploying both in production environments, and the interesting shift isn't in benchmark wins - it's in how enterprises are making buying decisions. We're seeing 40% of enterprise apps integrating task-specific agents this year, up from basically zero meaningful deployment in 2024. That's not a model quality story, that's an infrastructure maturity story. The real competition isn't Anthropic vs OpenAI - it's Microsoft Copilot at $21-30/seat trying to justify that pricing against Salesforce Agentforce at $0.10/action.
I watched Klarna replace 700 agents with a multi-agent system handling 2.3M conversations, cutting resolution time from 11 minutes to 2. That's the metric that matters. Not which frontier model scores higher on coding benchmarks, but which stack lets you ship automation that actually works without a team of ML engineers babysitting it.