AI Week in Review 25.04.12

Google: Gemini 2.5 Flash, Veo2 editing, Lyria, Chirp3, Firebase Studio, A2A Protocol. Llama 4, DeepCoder-14B, Nemotron Ultra, SeedThinking-v1.5, DeepCogito, Kimi-VL-Thinking, HiDream-I1, Nova Sonic.

Apr 12, 2025

A coffee cup with a picture of a rodent in the foam

AI-generated content may be incorrect. — Figure 1. Google Imagen3 text-to-image demo example: Capybara coffee.

Top Tools – Google’s AI and Agents

Google announced many major AI releases at their Cloud Next 25 event, including support for the AI agent ecosystem, their seventh-generation Tensor Processing Unit (TPU), Ironwood, and many AI model updates:

Google announced Gemini 2.5 Flash as an API, offering fast, cost-effective access to Gemini 2.5 through Vertex AI.
Google introduced Veo 2 Editing, incorporating advanced video editing to its top-tier Veo 2 video model. The upgrade will streamline video processing tasks and improve output quality for applications in video AI.
Google released improvements to Imagen 3 image model, including inpainting, image generation quality enhancements, and more refined control over image synthesis.
Google moved Lyria text-to-music model into preview. The Lyria AI model is designed to translate textual descriptions into coherent musical pieces.
Google announced the upcoming TPU v7 (Ironwood), which delivers significant improvements in computational performance over prior TPU generations, keeping Google’s AI infrastructure competitive.
Google unveiled Chirp 3 HD Voices, a new voice model that incorporates high-definition voice reproduction along with voice cloning capabilities.
Google updated Deep Research to use the more powerful Gemini 2.5 Pro.

Google launched Firebase Studio, an AI-powered app development platform that allows developers to build, test, launch, and monitor web and mobile apps using Gemini AI. Firebase Studio is a full-stack, cloud-based generative AI environment rebranded from Project IDX that integrates with Firebase. The platform is in preview and can be accessed via the Firebase Studio portal.

Regarding the AI agent ecosystem, Google announced official support for MCP, making it clear that Model Context Protocol (MCP) is the protocol for model-tool interoperability.

They also announced their own open standard, Agent2Agent (A2A) Protocol, for secure, interoperable communication between different AI agents. Built on web standards (HTTP, SSE, JSON-RPC), the A2A Protocol supports key functionalities such as task management and modality negotiation. It is complementary to MCP and may become part of the “protocol stack” to make AI agent systems work.

Google expanded its agentic AI offerings with Agent Development Kit and Agent Engine. ADK simplifies multi-agent system creation on Gemini models, while Agent Engine offers enterprise-grade controls for managing agent deployment. Google's new Agent Garden also provides pre-built agents and tools for users.

Google plans to eventually combine Gemini AI models with its Veo video-generating models, improving Gemini’s understanding of the physical world and creating an "omni" model capable of understanding and synthesizing various forms of media.

Google announced Gemini in Android Studio for businesses, a new subscription-based offering to boost AI integration in large organizations without compromising data governance.

Google Cloud introduced its Unified Security platform, integrating security operations, threat intelligence, and Mandiant expertise, powered by Gemini AI. The platform offers features like preemptive security, AI-driven threat analysis, and automated response capabilities to improve enterprise cyber-security.

AI Tech and Product Releases

Meta released Llama 4 initial two models, the Scout Mixture-of-Experts (MoE) model with 109B total parameters (17B active parameters and 16 experts) and Maverick MoE with 400B total parameters (with 17B active parameters and 128 experts). These are multimodal, multilingual MoE models, trained on over 30 trillion tokens.

However, real-world results are underwhelming and don’t confirm Llama 4’s claimed benchmark results (see “Llama 4 Released – Meta Goes MoE”), and disappointment stacked up as it was revealed some high benchmark scores were a mirage. Meta's experimental Llama 4 Maverick model initially scored high on LM Arena, but it was uncovered that was a Llama 4 version tuned for conversation; now on LM Arena, the actual unmodified Llama 4 Maverick ranks lower than older models like GPT-4o.

Together AI in collaboration with Agentica at UC Berkeley has released DeepCoder-14B-Preview, an open finetuned code reasoning model. DeepCoder-14B-Preview is derived from Deepseek-R1-Distilled-Qwen-14B via distributed reinforcement learning and designed for coding tasks. It has an impressive Codeforces rating of 1936 and 60% on LiveCodeBench, exceeding o3-mini (low) on coding. The model, dataset, and evaluation logs are openly available.

Nvidia announced Nemotron Ultra, a pruned and distilled version of Llama 3-405B model that cuts the parameter count nearly in half to 253B parameters and incorporates reasoning. It gets 76 on GPQA Diamond and 68% on LiveCodeBench.

We started with llama-405B, changed it via NAS pruning then followed by reasoning-focused post-training: SFT + RL in FP8.

TikTok parent ByteDance has a new AI model. ByteDance introduced Seed-Thinking-v1.5, a Mixture-of-Experts (MoE) reasoning model with only 20B active parameters (and 200B total parameters), that achieves superb performance on reasoning tasks, close to Google's Gemini 2.5 Pro and OpenAI's o3-mini-high – 86% on AIME 2024. ByteDance also introduced BeyondAIME, a new, harder math benchmark to evaluate model performance.

A graph of a graph

AI-generated content may be incorrect. — Figure 2. Seed-Thinking-v1.5 gets stellar results on code, math, and reasoning benchmarks. This is the AI model Llama 4 Maverick should have been.

DeepCogito has launched a suite of Llama fine-tuned models, with sizes of 3B, 8B, 14B, 32B and 70B parameters. These models are fine-tuned to be hybrid reasoning models using a technique called iterated distillation and amplification (IDA) and achieve SOTA performance on reasoning benchmarks for their size. They are open license and available on HuggingFace. AI startup Deep Cogito just emerged from stealth and plans to release larger models in coming months.

A screenshot of a graph

AI-generated content may be incorrect. — Figure 3. DeepCogito’s Cogito 32B model shows SOTA performance, outperforming Qwen QwQ 32B.

MoonShot AI has released the Kimi-VL and Kimi-VL-Thinking models, open-source vision language models (VLMs) based on MoE architecture that use only 3B active parameters. These models can process high-resolution images, support a 128K context length, and include a reasoning variant that competes with much larger models (like GPT-4o) on benchmarks like MathVision and ScreenSpot. More details are found in Kimi-VL Technical Report; the models are on HuggingFace.

HiDream open-sourced its HiDream-I1 family of image generation models under an MIT license, offering variants such as Dev, Full, and Fast with 17 B parameters. HiDream-I1-Dev performs strongly against competing models including leading Flux1.1 pro in benchmarks, delivering superior prompt following and text rendering capabilities. Models are on HuggingFace and also can be evaluated online at Vivago.ai.

A collage of images of people and a wolf

AI-generated content may be incorrect. — Figure 4. HiDream image generation. The demos show SOTA rendering for an open-source AI image generation model.

Amazon has introduced the new Nova Sonic, a speech-to-speech foundational model. With it they have a “speech system that gets tone, style, and pace,” going beyond traditional speech understanding and speech synthesis:

Nova Sonic even understands the nuances of human conversation, including the speaker’s natural pauses and hesitations, waiting to speak until the appropriate time, and gracefully handling barge-ins.

Jina has introduced Reranker m0, a state-of-the-art multimodal reranking model aimed at improving retrieval quality in retrieval-augmented generation (RAG) tasks that combine images and text.

OpenAI has enhanced ChatGPT’s memory functionality, enabling it to reference all past chats to personalize responses. This upgrade incorporates users’ historical interactions, preferences, and project details to text, voice, and image generation. The feature, available in ChatGPT's settings, is rolling out to paid subscribers, excluding users in the EU, UK, and other regions due to AI regulations.

Canva added AI assistant and code to create apps with prompts. Canva AI can create images, design ideas, write copy, and create mini-apps through Canva Code.

Anthropic has launched a new "Max" subscription tier for its Claude chatbot, with two tiers: $100/month for five times the usage of the "Pro" plan, and $200/month for twenty times the usage.

YouTube is launching an AI-powered "Music assistant" feature in Creator Music for creators to generate custom instrumental backing music. Creators can describe the type of music they want using prompts, specifying details like instruments and mood, and then download the free-to-use tracks.

Writer unveiled AI HQ, an AI agent workflow platform that allows businesses to build, activate, and supervise AI "agents" for complex automated workflows. Writer's AI HQ has a focus on enterprise-specific solutions with secure data handling.

OpenAI may launch several new AI models, including GPT-4.1, as early as next week. GPT-4.1 would be an update of OpenAI’s GPT-4o, which was released last year. Also, OpenAI will soon phase out GPT-4 from ChatGPT, fully replacing it with GPT-4o.

AI Research News

A great new AI research result for long-form AI video, presented in “One-Minute Video Generation with Test-Time Training” brings AI video generation to one-minute Tom and Jerry cartoon creation:

We experiment with Test-Time Training (TTT) layers, whose hidden states themselves can be neural networks, therefore more expressive. Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos from text storyboards.

The results are still research-prototype level quality, given the model is a 5B pretrained model only, but it’s a step forward.

Cartoon of a cat holding a box with a mouse in it

AI-generated content may be incorrect. — Figure 5. From a one-minute video generation of Tom and Jerry - Jerry offers Tom some cheese in an alley.

A recent study from Microsoft Research reveals that AI models from OpenAI and Anthropic often fail to resolve software bugs. Researchers suggest the underwhelming performance is due to data scarcity in training data, specifically a lack of data representing human debugging processes.

AI Business and Policy

Ex-OpenAI employees filed a brief supporting Musk's lawsuit against OpenAI's conversion to a for-profit corporation. The brief argues that transitioning to a for-profit structure would violate OpenAI's original mission and charter commitments.

Thinking Machines Lab, from ex-OpenAI CTO Mira Murati, is seeking a historic $2B seed round. The funding would value the AI startup at $10B, despite having no current product or revenue.

Ireland’s DPC is investigating X over use of EU users’ data to train Grok. The investigation will focus on how X processes personal data from public posts by European users for AI training purposes. This follows concerns about X quietly opting in users to share data with xAI to train Grok.

Chef Robotics raised $23 million to expand its AI-powered food automation. Chef Robotics has installed dozens of robots across the U.S. that have made 45 million meals to date.

Wells Fargo built a large-scale generative AI system that works without exposing sensitive data. Wells Fargo's AI assistant, Fargo, handled 245.4 million customer interactions in 2024 and maintained privacy by using a privacy-first pipeline and internal systems for data scrubbing. The bank uses a "compound system" with various models like Gemini and Llama, orchestrated to optimize performance and efficiency.

Google is partnering with Range Media to commission films about the relationship between humanity and AI. Two films are already in production and are slated for release later this year.

The U.S. Secretary of Education Linda McMahon repeatedly called artificial intelligence "A1"- like the steak sauce - at the ASU+GSV Summit, a gathering of education and technology experts. No words, but it’s making me hungry.

Apr 13

Another week where there is so much AI news it's like trying to stuff 10 lbs of AI news into a 5 lb bag. I could have made a whole article on just Google's announcements, there's so much there to unpack. Perhaps the most important announcement for the long-term might be Google's A2A Protocol. It's one key piece of the AI agent ecosystem to make multi-agent AI systems fully interoperable.

Expand full comment

AI Changes Everything

Discussion about this post