AI Week In Review 25.02.08

Faster LeChat, Gemini 2.0, o3-mini reasoning traces, Constitutional Classifiers, GitHub Copilot Agent Mode and Edits, Pika Additions, Krutrim 2 12B, HuggingFace AI Appstore, Meta PARTNR.

Feb 08, 2025

Figure 1. Meta’s PARTNR project brings datasets, models, and benchmarks to help train robots via simulation to be helpful household partners.

AI Tech and Product Releases

Google released Gemini 2.0 models for general availability, including Gemini 2.0 Pro Experimental, Gemini 2.0 Flash, Gemini 2.0 Flash Thinking Experimental with app, and Gemini 2.0 Flash-lite, making them available on Gemini Advanced interface, Vertex AI APIs, and AI Studio platform.

We shared Gemini 2.0’s impressive benchmarks and broad multimodal and AI agent support features in depth in “Deeply Useful AI Models – Gemini 2.0 and o3-mini.” Gemini 2.0 Flash and Flash-lite models also boast high speed and lower cost, making them useful and cost-effective work-horse AI models in AI agent flows.

The AI community reaction from recently-released OpenAI Deep Research and o3-mini model releases has been very positive, including from our perspective. Realizing that hiding reasoning traces makes their AI reasoning models less useful, OpenAI has opened up the o3-mini thinking traces. Alex Volkov wonders “Are these raw ones from the model or still a summarization model?” I think it’s a summarization, but it’s far more useful than not having them.

Mistral has launched an all-new faster LeChat interface with new features powered by Cerebras technology, achieving generation speeds of up to 1000 tokens per second. The new LeChat introduces native iOS and Android mobile applications as well as web interface, and also integrates a code interpreter, Canvas for visual tasks, and advanced OCR capabilities. Mistral offers a $15 monthly tier for Pro users.

Anthropic has introduced Constitutional Classifiers aimed at preventing model jailbreaks. These classifiers enforce a set of guidelines during user interactions to enhance safety and ethical use of AI, and Anthropic used them to dramatically improve resistance to jailbreaking with minimal over-refusals and without incurring a large compute overhead.

183 active participants spent [over] 3,000 hours over a two-month experimental period attempting to jailbreak the model. They were offered a monetary reward of up to $15,000 should they discover a universal jailbreak. Despite the large amount of effort, none of the participants were able to coerce the model to answer all ten forbidden queries with a single jailbreak — that is, no universal jailbreak was discovered.

HuggingFace has rebranded itself as an AI Appstore to host a wide range of AI applications. The HuggingFace AI Appstore shows the HuggingFace spaces for AI models, and now allows developers to showcase, share, and deploy AI models and tools in a centralized marketplace. AI Appstore gives AI users a convenient centralized place to evaluate thousands of AI models and tools.

GitHub released Copilot enhancements, including agentic workflows and inline editing features for code development. The enhancements provide more context-aware AI capabilities to improve developer productivity within the VSCode environment:

GitHub Copilot’s new agent mode is capable of iterating on its own code, recognizing errors, and fixing them automatically.
Copilot Edits combines the best of Chat and Inline Chat with a conversational flow and the ability to make inline changes across a set of files that you manage.

A screenshot of a computer screen

AI-generated content may be incorrect. — Figure 2. GitHub Copilot’s Multifile Edit in action.

Replit announced their AI Apps agent is free-to-try - “we've made our AI tools free to try and redesigned our mobile app from the ground-up.” Replit’s “free to try” means “first 10 checkpoints are free.” This will assist users in building mini apps on demand using Replit AI tools.

Google has released Imagen 3 image model into its Gemini API. The model generates high-quality, photorealistic images at a cost of 3 cents per image.

Pika Labs has introduced Pika Additions, a new tool that enhances existing videos by integrating additional visual elements. This feature allows users to augment pre-existing video content seamlessly with AI-generated components.

An elephant lying on a couch

AI-generated content may be incorrect. — Figure 3. Pika Additions lets you add the elephant in the room to videos.

OpenAI’s canvas feature has been updated to allow users to share their workspace creations. This update makes it easier to collaborate by enabling direct sharing of canvas content.

Krutrim AI Labs from India has released the Krutrim suite, which includes Krutrim 2 12B multi-lingual LLM, Chitrath VLM for visual language tasks, as well as models for embeddings and translation. The suite is specifically designed to enhance AI performance for Indic languages and applications. Krutrim AI models are all open-source and available on HuggingFace.

HuggingFace announced OpenDeepResearch, an open-source project that seeks to replicate the advanced research capabilities of proprietary research agents. The tool supports thorough information retrieval and reasoning in an open-source community-driven framework.

Related to this, there are community efforts on open Deep Research such as from Firecrawl’s Nick Camara. Also, Dan Zhang also released an open Deep Research implementation on GitHub. “You can even tweak the behavior of the agent with adjustable breadth and depth.”

Jina AI has released Node-DeepResearch, another open-source implementation inspired by OpenAI’s Deep Research agent. The tool operates using a “query, search, read, reason, repeat” methodology to provide comprehensive research insights.

AI Research News

ByteDance released OmniHuman-1, an image-to-video AI model engineered for high-fidelity human animation, and it’s really good, even “a next-level reality creator” that is realistic scary good. The model was trained on 19,000 hours of human video footage, allowing for natural movements and speech synchronization. It replicates realistic human movements and expressions in video with impressive detail from audio and image inputs. ByteDance shared research about it in the paper “OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models.”

Paris-based KyutAI has released to open-source Hibiki, a suite of simultaneous translation models aimed at real-time multilingual communication. The open-source models enable efficient translation across multiple languages and are available via HuggingFace. They shared a research paper on it “High-Fidelity Simultaneous Speech-To-Speech Translation.”

Hibiki leverages a multistream language model to synchronously process source and target speech and jointly produces text and audio tokens to perform speech-to-text and speech-to-speech translation.

There has been an flurry of AI reasoning-related open-source projects and research results recently, inspired by the release of DeepSeek R1. We’ve mentioned Bespoke-Stratus reasoning distillation in a prior Research Review. Here’s a few more: S1 and R1-V.

Researchers from Stanford unveiled Simple Scaling – S1, which developed a high-quality reasoning model a limited training dataset. In their research, presented in “s1: Simple test-time scaling,” they fine-tuned the Qwen2.5-32B model on a curated set of 1,000 reasoning examples. The resulting model exceeded o1-preview on MATH and AIME24 benchmark results. The approach utilizes a technique called “budget forcing” that allows the model to optimize its compute usage while maintaining competitive performance.

A team of researchers called Deep Agent introduced R1 –V, an open-source project that replicates the “aha moment” observed in training R1. It does it at a remarkably low cost, training a 2B model to reason capably in 100 GPRO steps at a cost of just $3.

Meta just announced PARTNR, a research framework supporting seamless human-robot collaboration, will be released to open source:

Building on our research with Habitat, we’re open sourcing a large-scale benchmark, dataset, and large planning model that we hope will enable the community to effectively train social robots.

The core of the PARTNR initiative is its extensive benchmark of 100,000 tasks, which are used to train AI models through simulation-based human demonstrations. The datasets and models can be used to train robots to collaborate with humans to do housework. The PARTNR research has been published in “PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks.”

Nvidia and Carnegie Mellon researchers developed ASAP, a robotics model that can mimic full human-body motion. Their work was published in the paper “ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills.”

A collage of a person in a uniform

AI-generated content may be incorrect. — Figure 5. The humanoid robot (Unitree G1) demonstrates diverse agile whole-body skills, highlighting the control policies’ agility while replicating human athletes.

DeepMind’s latest math reasoning model, AlphaGeometry2, outperforms average gold medalists in solving International Mathematical Olympiad geometry problems. The system solves 84% of past IMO geometry questions (versus 54% with AlphaGeometry) using a hybrid approach combining neural network-based Gemini models and symbolic engines. The research was just published on Arxiv in “Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2.”

AI Business and Policy

Safe Superintelligence, the AI startup founded by Ilya Sutskever, is in talks to raise funding at a valuation of “at least” $20 billion, a fourfold increase from its September 2022 valuation of $5 billion. This is despite not having generated any revenue nor even having a product, but a testament to Ilya’s reputation as an AI pioneer.

Anthropic CEO says DeepSeek was ‘the worst on a critical bioweapons data safety test. In an interview, Dario Amodei stated that DeepSeek’s model performed poorly in generating sensitive information. Despite not being immediately dangerous, Amodei warned of potential future risks and urged the company to prioritize AI safety considerations.

Ahead of the AI Action Summit in Paris, France and the UAE have agreed on a €30-50 billion investment for an AI campus featuring a data center with up to 1GW capacity. France has identified 35 potential locations for fast-tracked administrative approvals, aiming to attract foreign investments, particularly for energy-intensive data centers.

In addition, France’s public investment bank plans to invest up to €10 billion in AI ecosystem. Bpifrance, a key investor in tech startups, will focus on foundation models, AI infrastructure, and hardware companies, aiming to bolster France’s position as a global AI leader.

These are big investments, but US Big Tech firms including Meta, Alphabet, and Microsoft are going even bigger. Amazon is set to spend over $100 billion on AI capabilities for its cloud division AWS in 2025, according to CEO Andy Jassy.

AI Opinions and Articles

This review of the impact of AI on Hollywood “The Brutalist, AI and the future of cinema” paints a picture where AI has already changed everything in Hollywood, despite Hollywood not yet facing the coming near-perfect video replicas of everything:

It is being used in almost every phase of production by some movie makers, from scriptwriting and pre-production through to visual effects and post-production, as well as distribution.

Ahead of the AI Action Summit in Paris, Stanford computer scientist and World Labs founder Fei-Fei Li has laid out three principles for AI policymaking:

Center AI policies on current scientific reality rather than futuristic scenarios.
Emphasize pragmatism over ideology - “minimize unintended consequences while incentivizing innovation.”
Empower the entire AI ecosystem “including open-source communities and academia” by promoting open-source and unrestricted access to models and tools.

“Open access to AI models and computational tools is crucial for progress. Limiting it will create barriers and slow innovation, particularly for academic institutions and researchers who have fewer resources than their private-sector counterparts. – Fei-Fei Li.

IJCAI 2023

Feb 11

The following report provides insights into s1 and DeepSeek-R1 that you may find valuable:

AI Changes Everything

Discussion about this post