AI Week in Review 25.04.26

Dia TTS, Ernie 4.5 Turbo & Ernie X1 Turbo, Gemma 3 QAT, gpt-image-1, Firefly Image Model 4, lightweight Deep Research, Copilot Researcher & Analyst agents, Pleias-RAG, SWE-PolyBench, AI Nose.

Apr 26, 2025

Figure 1. Snapshot from AI video generation from Adobe Firefly Video model, now released to general availability. Adobe Firefly Image Model 4 is out.

Top Tools

Nari Labs launched Dia, a 1.6B parameter open source text-to-speech (TTS) model, capable of generating ultra-realistic dialogue with emotional intonation and nonverbal expressions. The two-person startup Nari Labs claims Dia surpasses ElevenLabs and Google's NotebookLM in quality, and demo examples for Dia back that up, showing excellent human-level realistic TTS.

Dia's code and weights are available for download on GitHub and Hugging Face, allowing developers to integrate high-fidelity TTS into applications with minimal cost and overhead.

AI Tech and Product Releases

Baidu unveiled its Ernie 4.5 Turbo and reasoning model Ernie X1 Turbo, offering improved multimodal AI capabilities in stronger, faster, cheaper AI models. Ernie 4.5 Turbo benchmark performance is comparable to GPT-4.1, while Ernie X1 Turbo AI reasoning model has performance on par with OpenAI o1 and DeepSeek R1. Both AI models have exceptionally low API prices for their strong benchmark performance, making them cost-effective AI models for the Chinese domestic AI market.

Figure 2. Baidu’s Ernie X1 Turbo boasts o1-level performance on par with OpenAI o1 and DeepSeek R1, but costs only $0.14 per 1 million input tokens and $0.55 per 1 million output tokens, 25% less than DeepSeek R1.

Google announced Gemma 3 QAT, Quantization-Aware Training (QAT)–optimized models that dramatically reduce memory requirements while maintaining model quality:

QAT incorporates the quantization process during training. QAT simulates low-precision operations during training to allow quantization with less degradation afterwards for smaller, faster models while maintaining accuracy.

The QAT Gemma 3 27B model quantized to 4 bits uses only 14.1 GB, small enough to run on 24GB local GPU, yet it performs close to the level of the full Gemma 3 27B model. Quantized QAT Gemma 3 weights are available on Hugging Face and Ollama for download for local use.

OpenAI released gpt-image-1, the multimodal image generation and editing model, to its public API, bringing ChatGPT’s state-of-the-art image capabilities to developers. The model natively accepts both text and image input, supports granular instruction execution, and reliably renders text within images. Major platforms - including Adobe Firefly, Figma Design, and Canva - are already integrating gpt-image-1 into their AI applications.

Adobe launched the new Firefly Image Model 4 with improved quality and control, and they released the Firefly video model to general availability. Adobe also announced a redesigned Firefly web app that is integrating AI image-generation models from OpenAI and Google and extending support to mobile devices.

Firefly also provides creative professionals with the choice to explore in different aesthetic styles using models from partners, with Google Cloud and OpenAI models available today and models from partners including fal.ai, Ideogram, Luma, Pika and Runway available in the coming months.

Figure 3. Adobe’s Firefly and Express apps provide access to image generation and editing models from OpenAI, Google, and others.

Adobe's new Content Authenticity web app allows creators to embed attribution metadata into images for identification and to prevent AI training. The app enables bulk tagging, LinkedIn verification for identity authentication, and the inspection of images for Content Credentials and AI manipulation.

OpenAI is introducing a "lightweight" version of its ChatGPT deep research tool that uses OpenAI's o4-mini model, which is "nearly as intelligent" but cheaper to serve. This will allow for higher usage limits for Plus, Team, and Pro users, as well as access for free users.

Microsoft announced their 'Microsoft 365 Copilot Wave 2 Spring release' with new AI agents. The Researcher and Analyst agents, that handle research and data analysis, are accessible via a new "Agent Store." Microsoft envisions Copilot as a central AI interaction layer for human-agent collaboration, to restructure work and enhance productivity.

Dropbox upgraded their search tool Dash with AI understanding of content, allowing users to search across various media types like audio, video, and images. The update includes people search and improved enterprise tooling for IT admins to exclude sensitive documents.

French AI startup Pleias released Pleias-RAG-350M and Pleias-RAG-1B, small AI reasoning models designed for retrieval-augmented generation (RAG) and citation synthesis with structured multilingual output. Pleias describes the training in a detailed report. outperform larger models in multilingual tasks and complex reasoning.

A diagram of a process flow

AI-generated content may be incorrect. — Figure 4. Pleias RAG models have been developed to implement Anthropic citation mode. It comes with agentic capacities with the reasoning step anticipating RAG features: query routing, query reformulation, and source reranking.

Perplexity's iOS app now supports a conversational AI voice assistant. In the Perplexity iOS Voice Assistant, users can now speak to the assistant to perform tasks such as writing emails, setting reminders, and making reservations.

Earlier this month, xAI announced Grok 3 API availability, offering developers access to Grok 3 and Grok 3 Mini in the API for text processing, vision, and image generation, with additional capabilities such as structured outputs coming soon. Grok 3 Mini’s pricing is competitive for its performance, cheaper in its output than other new cost-competitive AI models like Gemini 2.5 Flash Thinking and o4-mini.

A screenshot of a graph

AI-generated content may be incorrect. — Figure 5. Grok-3 mini is a high-performing AI model priced very competitively in the API.

AWS introduced SWE-PolyBench, a multi-language AI coding benchmark. SWE-PolyBench evaluates AI coding assistants across Java, JavaScript, TypeScript, and Python, using over 2,000 real-world GitHub issues and introduces new evaluation metrics beyond pass rate, like file-level localization.

OpenAI plans to release their open-source AI model this summer. OpenAI intends to release a "text in, text out" LLM that can run on consumer hardware and has a permissive open license. OpenAI has engaged with the user community on the model requirements. They aim to outperform current open models from Meta and DeepSeek, but it may feature a "handoff" capability to connect to OpenAI's cloud-hosted models for complex queries.

Robots can now smell. Researchers at Ainos and Ugo Japan unveiled an “AI Nose” prototype that equips humanoid robots with the ability to detect odors. This innovation has applications for AI robotics in environmental safety and healthcare, enabling early identification of hazards like gas leaks and infections.

AI Research News

Our latest AI research weekly covered Nvidia’s Describe Anything Model (DAM) and several AI reasoning papers that extend RL training and AI reasoning:

Nvidia’s Describe Anything Model (DAM)
Reasoning Models Can Be Effective Without Thinking
TTRL: Test-Time Reinforcement Learning
Genius: A Generalizable and Purely Unsupervised Self-Training Framework for Advanced Reasoning

Google’s DolphinGemma analyzes dolphin vocalizations to decode how dolphins communicate and facilitate dolphin-human communication. A study revealed that dolphins can intentionally reproduce human vowel sounds. A collaborative project uses DolphinGemma-based tokenization and output to translate dolphin sounds into a shared vocabulary. AI may help us speak with animals!

AI Business and Policy

Ziff Davis is suing OpenAI for copyright infringement, alleging unauthorized copying of its articles to train AI models. The lawsuit claims OpenAI ignored instructions not to scrape Ziff Davis content and removed copyright information.

AI coding assistant startup Windsurf is cutting prices, eliminating "flow action credits" and reducing team plan prices to $30 per user monthly. The move signals intense competition with Cursor and comes amid reports of a potential acquisition by OpenAI.

Zencoder acquired Machinet to expand its AI coding assistant capabilities. The acquisition strengthens Zencoder’s position among Java developers using JetBrains, giving them an advantage over competitors tied to Visual Studio Code. Zencoder differentiates itself with "Repo Grokking" technology and error-corrected pipelines.

AI data centers may soon contain millions of chips, cost hundreds of billions, and require power equivalent to a large city. A new study analyzed over 500 AI data center projects and found computational performance, power requirements, and capital expenditures are more than doubling annually.

OpenAI is reportedly building its own social network, similar to X, which focuses on ChatGPT's image generation and contains a social feed. This app could provide OpenAI with real-time data for training AI models, like X and Meta already use. OpenAI’s purchase of Windsurf could also be partly driven by OpenAI’s desire for the data the Windsurf user base will provide.

Onc.AI, specializing in oncology management, announced the presentation of its breakthrough-designated Serial CTRS AI model at an upcoming medical conference, highlighting progress in clinical trials. The solution uses predictive analytics and AI-based imaging to accelerate clinical decision-making and improve treatment response.

AI agent startup Manus AI has raised $75 million in a funding round at a $500 million valuation. The Chinese startup plans to expand into new markets like the U.S., Japan, and the Middle East.

Roy Lee, a Columbia student suspended for creating an AI-based interview cheating tool, has turned it into a startup called Cluely. The company has raised $5.3 million for AI to help users ‘cheat on everything’.

The state Bar of California now admits California bar exam questions were written with AI assistance. A backlash erupted after students complained that bar exam multiple-choice questions seemed AI-generated.

President Trump issued an Executive Order on Promoting AI Literacy in the United States. It establishes a White House Task Force on Artificial Intelligence Education and a Presidential AI Challenge, with the goal to integrate AI into K-12 education, train educators, and foster early exposure to AI concepts.

Over 10,000 comments were submitted to the White House regarding its national AI policy, touching on topics such as copyright, AI research, and safety and environmental concerns. This comes as President Trump rejiggers the U.S. Government’s AI priorities, revoking Biden's AI Executive Order and promoting AI development "free from ideological bias".

AI Opinions and Articles

Google’s DeepMind co-founder Demis Hassabis was on 60 Minutes this week, and the show introduced artificial general intelligence (AGI) to their audience. In his interview, Hassabis defined AGI as a system possessing "any cognitive capability humans have" and Hassabis projected AGI could arrive as soon as 2030, but voiced concerns about AI misuse and uncertainty surrounding the risks of AI.

One way to mitigate AI risks is to understand the inner working of AI models. Anthropic is pioneering mechanistic interpretability, aiming to decode AI decisions and ensure safety. Anthropic CEO Dario Amodei has set an ambitious goal for Anthropic to reliably detect most AI model problems by 2027. He is calling on industry peers to prioritize understanding over capability, advocating for increased research in this area.

AI Changes Everything

Discussion about this post