AI Week in Review 25.05.10

Nvidia Parakeet TDT, Lightricks LTXV-13B, Gemini 2.5 Pro I/O, Gemini implicit caching, Nemotron Ultra V1, Mistral Medium 3, ACE-Step 3.5B, HeyGen Avatar IV, Figma Make, HunyuanCustom, Absolute Zero.

May 10, 2025

Two animals wearing sunglasses

AI-generated content may be incorrect. — Figure 1. Still from LTXV 13B video generation. LTXV 13B generated this video using a consumer GPU, at a speed 30x that of comparable AI video generation models.

Top Tools

We love SOTA open-source AI models, so our top tools for this week are two extremely capable and fast open-source AI models: Nvidia’s Parakeet TDT for transcription, and Lightricks’ LTXV-13B for AI video generation.

Nvidia released Parakeet TDT 0.6B V2 and set a new transcription model speed record: It can transcribe 60 minutes of audio in just one second, boasting a “staggering real-time factor (RTF) of 3386.” Parakeet V2 also maintains quality and leads the ASR leaderboard with a WER of only 6.05%. The model is open weights and available on HuggingFace to download and to Demo.

Lightricks released LTXV-13B, a 13B parameter AI video generation model that uses multiscale rendering to produce high-quality outputs up to 30 times faster than comparable models. How fast is it?

It produces 30 FPS videos at a 1216×704 resolution faster than they can be watched.

The model integrates keyframe and character movement controls, multi-shot sequencing, and supports full LoRA conditioning for custom styles. LTXV-13B runs efficiently on consumer GPUs (like an RTX 4090) and is open source and available on both Hugging Face and GitHub, as well as on Lightricks' LTX Studio.

A screenshot of a baby and a cat

AI-generated content may be incorrect. — Figure 2. LTXV 13B supports customized LoRAs for video generation.

AI Tech and Product Releases

Google announced Gemini 2.5 Pro I/O Edition early access release. This Gemini 2.5 Pro update significantly improves coding capabilities, especially for building interactive web apps. The updated model showed gains on LiveCodeBench and Aider Polyglot benchmarks and jumped to the top of the WebDev Arena leaderboard. This model can generate interactive web apps from single prompts and has SOTA video understanding, supporting video-to-code workflows. It is available on Google AI Studio and Vertex AI API.

Google introduced implicit caching for the Gemini API, automatically enabling up to 75% cost savings when requests hit the cache without requiring manual cache directives. This new method is automatic and enabled by default on repetitive context for Gemini 2.5 Pro and 2.5 Flash.

NVIDIA released open-source Nemotron Ultra V1 for general use, sharing weights on HuggingFace. Ultra V1 is a 253B parameter dense model distilled and pruned from Llama 3.1 405B, with a 128K context window and a dynamic “reasoning toggle” to turn reasoning on or off via system prompts. Multiple benchmarks show Ultra V1 is a SOTA open AI model. Nvidia released the model weights, full post‑training dataset, and codebases. Ultra V1 is available to try on Nvidia’s online chat.

A graph of different colored bars

AI-generated content may be incorrect. — Figure 3. Nemotron Ultra V1 excels on benchmarks performing better than R1. Ultra V1 beats the Llama 405B model it is derived from, and it also is also better than all other open source AI models except for the newly released Qwen 3 235B model.

Mistral AI launched Mistral Medium 3, a 128 K‑context multimodal model that achieves GPT‑4‑class benchmarks at a low cost of only $0.40 per million input tokens and $2 per million output tokens. Contrasting to its earlier open‑source offerings, Medium 3 is a proprietary model targeting enterprise deployments on Mistral La Plateforme.

Mistral AI also introduced Le Chat Enterprise, an AI assistant platform tailored for business use, emphasizing productivity and privacy. Powered by and complemented by Mistral's new Medium 3 model, the platform offers model customization, rapid deployment, and integration with existing business tools and workflows.

StepFun released ACE-Step 3.5B, an open-source foundation model for music generation. Thanks to multiple architectural innovations in the ACE-Step model, the music generation is excellent (near SOTA) and produced very efficiently:

our model synthesizes up to 4 minutes of music in just 20 seconds on an A100 GPU—15× faster than LLM-based baselines.

ACE-Step preserves fine-grained acoustic details, enabling advanced control mechanisms such as voice cloning, lyric editing, remixing, and track generation. ACE-Step 3.5B music generation demo examples are here.

HeyGen launched Avatar IV, which generates advanced lifelike AI avatars. Avatar IV transforms a single image and a script into a 30‑second lifelike video with natural facial expressions and hand gestures, using a diffusion‑inspired audio‑to‑expression engine to synchronize vocal tone, rhythm, and emotion with photorealistic facial movements and gestures.

OpenAI has enhanced ChatGPT's deep research capabilities with a new GitHub connector. This allows ChatGPT users to analyze codebases, understand code structure, and examine API implementations.

Additionally, OpenAI is now allowing third-party developers to custom fine-tune its o4-mini language model using reinforcement learning. The fine-tuning process utilizes a feedback loop and a grader model to align the models with desired objectives.

Figma announced Figma Make, an AI tool to “prompt your way to a functional prototype.” Figma Make is a direct response to AI-based UI prototyping tools like Bolt and Lovable.

Zencoder has launched Zen Agents, a platform designed to facilitate the creation of organization-wide AI tools for software development. The platform includes an open-source marketplace for sharing custom agents and uses the Model Context Protocol (MCP) for LLMs to interact with external tools.

Meta is reportedly developing "super-sensing" AI software for wearables that could recognize people and provide task reminders based on user actions. The software, activated by voice command, is intended to run for several hours on future smart glasses and earphones.

Amazon has released a new "Enhance My Listing" AI tool to assist merchants in improving their product listings. Powered by Amazon's Bedrock service, it suggests product titles, descriptions, and other details based on seasonal trends.

Cognition Labs announced Kevin-32B, a 32B reasoning model trained (with RL) specifically for CUDA kernel-specific coding tasks.

AI Research News

Tencent Hunyuan released HunyuanCustom, an open-weight multi-modal video generation model that can produce videos featuring consistent subjects under flexible user-defined conditions. They presented technical details on the model in the paper “HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation.” Examples are shared on GitHub, and the model is available on HuggingFace.

A collage of a person and person

AI-generated content may be incorrect. — Figure 4. HunyuanCustom generates consistent characters in various scenes based on prompts, enabling storytelling and longer-form consistent video generation.

Alibaba researchers have developed ZeroSearch, a novel technique that allows LLMs to learn search capabilities through simulation, significantly reducing the need for expensive search engine APIs. The ZeroSearch reinforcement learning framework trains LLMs to search without interacting with real search engines, saving up to 88% in training costs and offering better control and accessibility in AI training.

The paper “Absolute Zero: Reinforced Self-play Reasoning with Zero Data” presents a way for models to develop reasoning without external data. The authors show Qwen models achieving state-of-the-art results on coding and math benchmarks with no pre-existing data. This could accelerate scaling of RL-based reasoning, since it enables bootstrapping to higher reasoning with AI self-play.

We believe the Absolute Zero paradigm represents a promising step toward enabling large language models to autonomously achieve superhuman reasoning capabilities. – Authors of Absolute Zero

A person pointing at a red flag

AI-generated content may be incorrect. — Figure 5. Evolving from Reinforcement learning from verified rewards, enables agents to self-learn reasoning from human-curated QA pairs, the Absolute Zero Paradigm trains reasoning models without any human-curated data, replacing that with AI generated tasks with verifiable rewards.

A new study by Paris-based Giskard, an AI testing company, suggests that prompting AI chatbots for shorter answers can lead to an increase in hallucinations, particularly on ambiguous subjects. When models are pressured to be concise, they prioritize brevity over factual accuracy, potentially hindering their ability to avoid misinformation.

Our AI research review article for this week covered Nvidia’s recently released AI models for physical AI, specifically Cosmos Predict, Cosmos Reason1, Cosmos Transform 1, and the Gr00T N1 robotics model.

AI Business and Policy

OpenAI announced an evolution in its corporate structure, with the non-profit continuing to control OpenAI as a whole, and the OpenAI operating LLC becoming a Public Benefit Corporation.

OpenAI agreed to acquire Windsurf for $3 billion. OpenAI gains access to AI coding agent user base and the associated data, and VentureBeat suggests two main motives:

First, the need to arm the vital developer ecosystem with superior coding capabilities, and second, to win the broader, more defining battle to become the primary interface for a future shaped by autonomous AI agents.

Cursor raised $900 million at a $9 billion valuation, an incredible valuation for a company that started by forking open-source VS Code to build an AI coding assistant.

Microsoft has banned its employees from using the DeepSeek app due to concerns over data security and potential Chinese government influence, although Microsoft does offer a modified version of DeepSeek's R1 model on Azure.

Your data is being used in AI training: SoundCloud has updated its terms of service to allow the platform to train AI models on user-uploaded audio content. The updated terms state that user content may be used to "inform, train, develop or serve as input to artificial intelligence."

Fastino, a startup focused on training AI models using lower-cost gaming GPUs, has raised $17.5 million in funding. The company utilizes smaller, task-specific AI models trained on low-end hardware, claiming their models can outperform flagship models on specific tasks at a lower training cost.

FDA Commissioner Marty Makary has expressed interest in shortening the drug approval timeline with AI, noting that the FDA has completed its first AI-assisted scientific review and aims for agency-wide implementation by summer. In addition, OpenAI and the FDA have been in discussions regarding the potential use of AI to accelerate the drug approval process.

Microsoft CEO Satya Nadella has endorsed open protocols like Google DeepMind’s A2A and Anthropic’s MCP, highlighting their importance in enabling agentic AI. This endorsement signals a shift towards fostering cross-platform AI collaboration, interoperability, and standardization.

The Trump administration is reportedly planning to replace the "AI Diffusion Act", rescinding limits set by the Biden administration before its effective date of May 15^th. They intend to implement other restrictions on advanced AI chip exports based on national security concerns.

AI Opinions and Articles

An AWS study of IT leaders across nine countries, generative AI is projected to surpass cybersecurity as the top IT budget priority globally in 2025, with 45% of organizations planning to prioritize spending on generative AI over traditional IT investments.

An AI success story: The Ottawa Hospital has significantly reduced clinician burnout by 70% and improved patient satisfaction to 93% by integrating Microsoft's DAX Copilot. This AI tool captures physician-patient conversations and generates draft clinical notes, saving clinicians approximately seven minutes per patient encounter and allowing for better focus during visits.

AI Changes Everything

Discussion about this post