AI Week in Review 25.03.15

Gemma 3, Reka Flash 3, Deep Hermes 3 24B, OLMo 2 32B, Command-A. OpenAI agent tools, Reponses API, & Agents SDK. Gemini 2.0 free Deep Research, personalization, native image generation. Seedream 2.0.

Mar 15, 2025

A picture of a person on a beach

AI-generated content may be incorrect. — Figure 1. Photo-realistic **AI image from ByteDance Seedream 2.0**

Top Tools

This week, a quartet of impressive open-source AI models that can be run locally were released: Gemma 3 27B, Reka Flash 3 21B, Deep Hermes 3 24B, and OLMo 2 32B.

Google released Gemma 3, the latest generation of their open-source family of smaller LLMs, with 1B, 4B, 12B, and 27B parameter models. The larger Gemma 3 models are multilingual (140 languages), support a 128K context window and offer multimodal capabilities (text, image, and video processing). These Gemma AI models are excellent for image and video interpretation, and Gemma 3 27B sports a chatbot Arena ELO score of 1338, near SOTA. Gemma 3 is available to try out on Google AI Studio and Hugging Face.

Reka AI has launched 21B Reka Flash 3, an open-source AI reasoning model excelling in chat, coding, instruction following, and function calling. Developed using a reinforcement learning technique called REINFORCE Leave One-Out (RLOO), it performs competitively with models like OpenAI o1-mini, scoring 61.1% on GPQA and 62% on LiveCodeBench v5. It is a compact, general-purpose model designed for low-latency applications and on-device deployment, with model weights on HuggingFace and available to try on Reka Space.

Nous Research released DeepHermes 3 Preview, a 24B hybrid reasoning model fine-tuned from the Mistral-Small model that unifies reasoning and standard LLM responses. The model features improved annotation, judgement, and function calling, using the Llama-Chat format for multi-turn dialogues. Benchmarks: 56.6% on GPQA, 88.6% on MATH_HARD. DeepHermes 3 is available via API and HuggingFace, with code and examples for function calling and structured outputs on GitHub.

The Allen Institute for AI announced OLMo 2 32B, a high-performing fully-open (all data, code, weights, and details are freely available) 32B model. OLMo 2 32B outperforms GPT-4o mini and is comparable to leading similar-sized AI models on multi-skill AI benchmarks. The model is available to try out on the Allen AI playground or download via HuggingFace.

A graph of different colored lines

AI-generated content may be incorrect. — Figure 2. Benchmark numbers for OLMo-32B, Gemma 3 27B, and Reka Flash 3 21B. These are strong models, competitive with o1-mini and Qwen QwQ 32B on reasoning-related benchmarks and GPQA.

Nathan Lambert notes this milestone for small open-source AI models like Gemma 3 and OLMo 2 32B - reaching GPT-4 performance level. “We have a fully open-source GPT-4 class model.”

For AI users with 24 GB GPUs on their local PC, it’s worthwhile to download these models via Ollama. Models up to 32GB parameters can be quantized to fit on that setup, and it gives you near SOTA performance in a local AI model.

AI Tech and Product Releases

Cohere has introduced Command-A, a 111B parameter multilingual AI model intended for enterprise applications. Command A is built to support agentic tasks with strong tool integration and a 256K context window performance. It rivals GPT-4o and DeepSeek-V3 in enterprise tasks but with greater efficiency and lower hardware requirements.

OpenAI released tools and APIs to support building AI agents, with a new Responses API for calling AI models, the Agents SDK multi-agent framework, and integrated tools for Web Search, File Search, and Computer Use. OpenAI designed these additions aim to streamline tool use, real-time web search, and file retrieval for agentic AI applications. We discussed OpenAI’s agent tools release in depth in “Building Next-Gen AI Agents with OpenAI Tools and MCP.”

This week, Google launched a number of new Gemini 2.0 model updates and Gemini app features:

Google upgraded Deep Research tool to use Gemini 2.0 Flash Thinking and made it available for free on the Gemini app. Deep Research combines web search, AI reasoning, and planning to generate research reports on user selected topics.
Google launched Gemini 2.0 Flash to production and made it accessible via Google AI Studio and Vertex AI.
Google announced Gemini with personalization, which brings user data from other Google apps and services into the Gemini chatbot to provide tailored responses. The opt-in feature, in Gemini 2.0 Flash Thinking, integrates with Google Search history and analyzes queries to provide personalized insights, such as restaurant or travel recommendations.
Google AI Studio added support for native video understanding and analysis of YouTube links, improving video content summarization and support by automatically processing video links.
Google added Gemini 2.0 Flash native image generation, a feature that generates images natively within the Gemini app. This update enables users to produce and edit images directly via Gemini chat interactions, with many creative design-related applications.

A screenshot of a computer

AI-generated content may be incorrect. — Figure 3. Image editing in Gemini Flash 2.0. The Gemini Flash 2.0 multimodal image output capabilities can be tried in Gemini app or in AI studio.

ByteDance has introduced Seedream 2.0, a native Chinese and English bilingual AI image generation foundation model. Seedream 2.0 excels in text rendering, cultural nuance, and aligning outputs with human preferences, making it effective for generating high-quality, culturally relevant images. Technical details are in the research paper “Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model.”

A collage of images of various people

AI-generated content may be incorrect. — Figure 4. Seedream 2.0 image generations exhibit hyper-realism, adherence to prompts, and quality text generation in both English and Chinese.

AI company Sesame has released CSM-1B, the base model powering its realistic voice assistant Maya, under an Apache 2.0 license for commercial use. The 1B parameter CSM-1B model utilizes residual vector quantization (RVQ) and a Meta Llama-family model backbone to generate speech. The model lacks safeguards, raising concerns about potential misuse for realistic voice cloning.

Xbox introduced an AI-powered gaming sidekick called “Copilot for Gaming.” This voice-activated assistant, powered by Microsoft's AI and available on the Xbox mobile app, enhances gaming by answering questions, completing tasks, and offering real-time tips based on a player's performance and game context.

Microsoft is testing AI-powered summaries in Notepad.

Snapchat has released “AI Video Lenses” for subscribers, which can animate various animals or flowers in your snap with generative AI video models. Snap plans to add new AI-powered lenses weekly.

Due to Manus going viral, AI tool Browser Use experiences explosive growth. A key tool for AI agents, Browser Use enables AI models to easily interact with websites, helping to automate many virtual tasks. The tool experienced a surge in downloads after a post highlighting Manus's use of Browser Use went viral.

Copilot in GroupMe has been introduced to bring AI to your group chats. Copilot can help with writing responses, explaining concepts, making decisions, managing playlists, and planning events.

AI Research News

EuroBERT has been released as a family of multilingual encoder models for natural language processing for European and global languages, with sizes ranging from 210 million to 2.1B parameters. Presented in the paper “EuroBERT: Scaling Multilingual Encoders for European Languages,” the open-source EuroBERT models were trained on a 5T token dataset across 15 languages and support an 8K context window. EuroBERT models are available on HuggingFace.

Google DeepMind introduced Gemini Robotics and Gemini Robotics-ER, new Vision Language Action (VLA) models for robots to interact with the physical world. The new Gemini Robotics model enhances robots' generality, interactivity, and dexterity using the multimodal understanding of Gemini 2.0. Demo videos show robots performing tasks via voice commands and generalizing behavior across various hardware.

A slimmed-down model, Gemini Robotics-ER, focuses on advanced visual language understanding for robots. Gemini Robotics doubles performance on generalization benchmarks, while Gemini Robotics-ER achieves 2-3x success rates in tasks, enhancing spatial understanding. A benchmark dataset called Asimov is also available for researchers to train models and for safety research gauge risks.

AI Business and Policy

Google published an AI policy proposal advocating for weaker copyright restrictions on AI training and balanced export controls. Responding to the Trump administration call for an "AI Action Plan,” Google also urged federal AI legislation while cautioning against usage liability obligations for model developers. Not surprisingly, OpenAI also has called for U.S. copyright laws that lets AI model builders use copyrighted material.

OpenAI calls DeepSeek ‘state-controlled,’ calls for bans on ‘PRC-produced’ models. OpenAI has gotten Red-China-pilled, proposing to ban models from the Chinese AI lab DeepSeek and similar PRC-supported operations. They allege DeepSeek's models pose security and privacy risks, including IP theft, due to Chinese law compliance requirements.

Similarly, Anthropic's CEO Dario Amodei warns of Chinese espionage targeting US AI firms. He believes China is trying to steal valuable "algorithmic secrets," potentially costing companies millions.

Sam Altman on X promoted an AI model trained for creative writing. It’s interesting, but the reviews were mixed. OpenAI's creative writing AI produces mediocre, derivative fiction, says TechCrunch. They note the AI's writing, while technically skilled, lacks genuine human experience and emotion, resulting in inauthentic prose.

AI image generator startup Bria raised $40 million. Bria trains its models exclusively on licensed content, compensating image owners based on influence.

AI Opinions and Articles

The promise of AI-powered gadgets has fallen flat due to the immaturity of current AI technology, leading to unfulfilled expectations and a slowdown in meaningful hardware innovation. David Pierce at the Verge argues this has resulted in a generation of hardware being wasted while waiting for AI to catch up:

We were promised multimodal, natural language, AI-powered everything. We got nothing of the sort.

Companies have been overly focused on future AI capabilities rather than exploring AI's current practical ability to improve daily life. Products like the Humane Pin were technical wizardry in search of a problem. They explored the edge of what is possible, but they didn’t deliver a real-world solution to daily life problems.

Meanwhile, Apple suffers from not keeping up with the AI revolution. Extremely useful AI models are available via apps on your smartphone, but they don’t come from Apple.

AI Changes Everything

Discussion about this post