AI Week In Review 25.02.15

YouTube AI video & dubbing, Gemini memory, OpenAI ModelSpec, o3-mini multimodal, DeepHermes-3, Nomic Embed Text, Adobe Firefly Video, Perplexity Deep Research, DeepScaleR, OpenAI's GPT-5 roadmap.

Feb 16, 2025

A group of men in suits

AI-generated content may be incorrect. — Figure 1. Altman meets French President Macron at the AI Action Summit in Paris.

TL;DR - AI releases were mostly minor updates during this week of the AI Action Summit in Paris, but OpenAI revealed their roadmap for GPT-5 and new Claude models and Grok 3 releases are imminent.

AI Tech and Product Releases

YouTube Shorts is integrating with Google DeepMind’s Veo 2, allowing creators to generate AI video clips for their posts. YouTube is also expanding its AI dubbing feature to all creators, allowing creators to translate their videos into multiple languages.

Google’s Gemini AI chatbot can now recall previous conversations, enhancing personalization of interactions and allowing users to avoid repeating information. It is available to English Gemini Advanced users now and will be rolling out to others in coming weeks.

Laurentia Romaniuk of OpenAI announced the removal of warning messages intended to reduce unnecessary denials, allowing users more freedom and better user experience. Despite the change, ChatGPT continues to avoid answering sensitive or suspicious inquiries.

OpenAI has released Model Spec v2, a 63-page document outlining guidelines for AI model behavior, aiming to better balance user freedom and safety guardrails:

The update explicitly embraces intellectual freedom within defined safety boundaries, allowing discussion of controversial topics while maintaining restrictions against concrete harm.

It focuses on customization, transparency, and intellectual freedom, aiming to address controversial topics and reduce AI sycophancy. The spec is in the public domain, allowing developers and researchers to freely adopt it and build on it.

o1, o3-mini, and o3-mini high are now multimodal, supporting file and image uploads. Also, a new GPT-4o update is live on ChatGPT and “its writing is unbelievably good.”

AllenAI has released OLMoE, an open-source language model, as a standalone app for iOS devices. This fully open iOS app allows users to experience on-device AI privately and securely.

NousResearch released DeepHermes-3 Preview, an LLM that combines reasoning and traditional language model capabilities. Based on Llama-3.1 8B, DeepHermes 3 - Llama-3.1 8B is fine-tuned and distilled from DeepSeek-R1 to enhance reasoning. It features two modes: a "deep thinking" mode that uses long chains of thought to improve accuracy and a standard "intuitive" response mode of a traditional LLM. DeepHermes-3 Preview is open weights and available on Hugging Face.

Nomic AI released Embed Text V2 MoE, a state-of-the-art multilingual embedding model based on Mixture of Experts (MoE) architecture that supports around 100 languages for retrieval tasks. Nomic Embed Text V2 is open source and available on Hugging Face.

Groq has added Qwen models and 32B distilled DeepSeek R1 to its GroqCloud platform, offering blazing fast inference speeds for these models.

LMArena has released a dataset of 100,000 human preference votes on LLM responses. This dataset can be used to train models that predict and optimize user preferences.

Adobe announced Firefly Video, a text-to-video and image-to-video AI model. It is designed to be commercially safe and IP-friendly, allowing brands and professionals to generate production-ready content.

A lion with its mouth open

AI-generated content may be incorrect. — Figure 2. Still from Adobe Firefly video, based on a nature video prompt “*A majestic male lion yawns while surveying his domain from a rocky outcrop at sunrise …”*

OpenAI Roadmap: Sam Altman, in a long tweet, announced OpenAI’s roadmap for upcoming releases. Three key points:

We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.
[we will] unify o-series models and GPT-series models by creating systems that can use all our tools,
We will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.

OpenAI wants to remove the model picker and move towards integrating multiple AI technologies into a single system that’s easier to use.

Anthropic is preparing an upcoming release of Claude, a hybrid LLM / reasoning model which will let users adjust the level of reasoning Claude employs, to support both AI reasoning and traditional LLM use cases. This next Claude release is expected in the coming weeks, and Dylan Patel says Anthropic’s internal reasoning model is better than o3 and it reportedly excels at programming tasks and large codebase management.

Baidu is set to release its next-generation AI model, Ernie 5, later this year. Ernie 5 will integrate multimodal capabilities to process and convert between text, video, and audio formats, enhancing applications in content creation and enterprise solutions.

Top Tools & Hacks

Perplexity has launched their own Deep Research, inspired by Google’s Gemini Deep Research and OpenAI’s Deep Research these others and built on Perplexity’s existing capabilities of combining search and reasoning. It can access the web, academic sources, and social media for information, and builds a short report in a few minutes.

I had high hopes for Perplexity Deep Research, but in my first try it hallucinated “OpenAI officially released GPT-5 on February 12” from Sam Altman’s previewing GPT-5 in an interview. Overall quality and depth are less than Gemini and OpenAI’s Deep Research, but it can still be useful.

A screenshot of a computer

AI-generated content may be incorrect. — Figure 3. Free-tier Perplexity gives you a limited number of Deep Research queries each day, as well as access to AI reasoning models.

AI Research News

Meta released Audiobox Aesthetics and published an associated paper on their work “Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound.” The Audiobox Aesthetics assessment model uses four axes to evaluate audio: production quality, production complexity, content enjoyment, and content usefulness. This helps develop automated assessments of audio quality without human intervention.

The Agentica Project out of UC Berkeley has released DeepScaleR 1.5B, a tiny 1.5B parameter AI reasoning model that surpasses OpenAI's o1-preview, getting 43.1% on AIME 2024. Trained using reinforcement learning on 40,000 curated problem-answer pairs from math queries, DeepScaleR 1.5B achieves this high performance at a cost of $4,500.

DeepScaleR is an open-source project to fully democratize reinforcement learning (RL) for LLMs and reproduce DeepSeek R1 and OpenAI O1/O3 at scale on real tasks.

Our AI research review article for this week covered innovative research in test-time scaling and RL for improving reasoning:

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
OREAL: Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
Demystifying Long Chain-of-Thought Reasoning in LLMs
LIMO: Less is More for Reasoning

AI Business and Policy

Amazon and Apple are facing challenges integrating generative AI into their digital assistants Alexa and Siri. Amazon has delayed the release of its new Alexa until March or later, while Apple’s overhaul of Siri is encountering engineering issues and software bugs.

Alibaba Confirms Partnership with Apple for AI Features on Chinese iPhones. The partnership aims to boost iPhone sales in China, where the company has seen an 11% year-over-year drop.

Meta is forming a new team within its Reality Labs to develop humanoid robotics hardware for physical tasks like household chores. The group will focus on creating both robotic software and AI, building a foundational platform for the robotics market rather than producing consumer robots.

More Elon Musk and OpenAI drama: Elon Musk’s x.AI-led consortium offered $97.4 billion to buy OpenAI, specifically the non-profit controlling OpenAI, which includes a cash offer and would involve due diligence on OpenAI’s assets and staff, potentially giving Elon Musk insights into OpenAI. Musk's lawyers stated that if OpenAI’s board halts the company’s transition to a for-profit and preserves its mission as a charity, he will withdraw his offer.

In response, Sam Altman dismissed the bid as attempting to slow down OpenAI and argued it conflicts with Musk's lawsuit against changing OpenAI’s nonprofit status. OpenAI’s board unanimously rejected Musk's $97.4 Billion buyout offer. Bret Taylor, OpenAI’s board chair, stated that the bid was an attempt to disrupt competition, affirming that OpenAI is not for sale.

Consortium of publishers including Condé Nast, The Atlantic, and Forbes sue Cohere for “massive, systematic” copyright infringement. Plaintiffs accuse Cohere of using over 4,000 copyrighted works to train AI models and display large portions or entire articles, harming referral traffic and infringing on trademarks. Cohere strongly refutes the allegations, asserting their practices are responsible and the lawsuit is misguided.

Apptronik, a University of Texas spinout developing humanoid robots, secured $350 million in Series A funding. Apptronik plans to use the funds for scaling production and commercialization of its robots in industries like automotive manufacturing, with Google’s DeepMind partnering on AI integration.

Latent Labs, founded by a former DeepMind scientist, emerged from stealth with $50 million in funding. The company is developing AI foundation models for protein design and aims to “make biology programmable.”

The European Union has denied that its rollback on tech regulation, including scrapping the AI Liability Directive, was influenced by pressure from the Trump administration. EU digital chief Henna Virkkunen stated that the move aims to boost competitiveness by reducing bureaucracy. This announcement came after U.S. Vice President JD Vance urged Europe to embrace AI development and align with U.S. efforts at the Paris AI Action Summit.

The U.K. government is renaming the AI Safety Institute to the "AI Security Institute" to focus on cybersecurity risks posed by AI to national security and crime. The government also announced a new partnership with Anthropic to explore using its AI assistant Claude in public services, aiming to enhance efficiency and accessibility of information for UK residents.

AI Opinions and Articles

The AI Action Summit in Paris had a different tone this year, moving away from AI safety concerns and towards AI’s opportunities. As one headline put it, Europe Falls In Line With a US AI Vision:

“This conference was originally about AI safety, but it’s definitively moved to being about making money in AI,” Bloomberg Beta’s James Cham told me.

Vice President JD Vance’s speech at the AI Action Summit set this tone, starting his speech by saying:

I am not here to talk about AI Safety, I’m here to talk about AI opportunity…. To restrict its development now … would mean paralyzing one of the most promising technologies in generations.

Vice President Vance expressed four key points in his address:

American AI technology will continue to be the gold standard world-wide.
We believe excessive regulation of the AI sector could kill a transformative industry … we encourage pro-growth policies.
We feel strongly AI must be free from ideological bias and AI must not become a tool for authoritarian censorship.
We will maintain a pro-worker growth path for AI, so it can be a potent tool for job creation in the United States. … It will never replace human beings.

Vice President Vance made clear the Trump administration is in the AI accelerationist camp, encouraging the growth and development of AI, including open-source AI, while opposing regulations that might stifle it.

“AI, we believe, will make us more productive, more prosperous, and more free.” - Vice President JD Vance

AI Changes Everything

Discussion about this post